Diagnostic sequencing by a combination of specific cleavage and mass spectrometry

ABSTRACT

The present invention is in the field of nucleic acid-based diagnostic assays. More particularly, it relates to methods useful for the “diagnostic sequencing” of regions of sample nucleic acids for which a prototypic or reference sequence is already available (also referred to as “re-sequencing”), or which may be determined using the methods described herein. This diagnostic technology is useful in areas that require such re-sequencing in a rapid and reliable way: (i) the identification of the various allelic sequences of a certain region/gene, (ii) the scoring of disease-associated mutations, (iii) the detection of somatic variations, (iv) studies in the field of molecular evolution, (v) the determination of the nucleic acid sequences of prokaryotic and eukaryotic genomes, (vi) identifying one or more nucleic acids in one or more biological samples&#39;, (vii) and determining the expression profile of genes in a biological sample and other areas.

This application claims the benefit Provisional Application No.60/131,984, filed Apr. 30, 1999.

FIELD OF INVENTION

The present invention is in the field of nucleic acid-based diagnosticassays. More particularly, it relates to methods useful for the“diagnostic sequencing” of regions of sample nucleic acids for which aprototypic or reference sequence is already available (also referred toas ‘re-sequencing’), or which may be determined using the methodsdescribed herein. This diagnostic technology is useful in areas thatrequire such re-sequencing in a rapid and reliable way: (i) theidentification of the various allelic sequences of a certainregion/gene, (ii) the scoring of disease-associated mutations, (iii) thedetection of somatic variations, (iv) studies in the field of molecularevolution, (v) the determination of the nucleic acid sequences ofprokaryotic and eukaryotic genomes; (vi) identifying one or more nucleicacids in one or more biological samples; (vii) and determining theexpression profile of genes in a biological sample and other areas.

BACKGROUND OF INVENTION

Complete reference genome sequences for a number of model organisms aswell as humans are currently available or are expected to becomeavailable in the near future. A parallel challenge is to characterizethe type and extent of variation in the sequences of interest because itunderlies the heritable differences among individuals and populations.In humans, the vast majority of sequence variation consists ofnucleotide substitutions referred to as single nucleotide polymorphisms(SNPs). DNA sequencing is the most sensitive method to discoverpolymorphisms [Eng C. and Vijg J. et al., Nature Biotechnol. 15: 422–426(1997)]. A growing panel of such sequence variants, together withpowerful methods to monitor them [Landegren U. et al., Genome Res. 8:769–776 (1998)], is useful in linkage studies to identify even the mostsubtle disease susceptibility loci [Lander E. and Schork N., Science265: 2037–2048 (1994); Risch N. and Merikangas K., Science 273:1516–1517 (1996)]. Also, the identification of all (functional) allelicvariants will require the re-sequencing of particular regions in a largenumber of samples [Nickerson D. et al., Nature Genet. 19: 233–240(1998)]. Although a number of methods to monitor known SNPs have beendeveloped [Landegren U. et al., Genome Res. 8: 769–776 (1998)],re-sequencing is likely to be routinely applied to secure diagnoses ofpatients. Indeed, in a significant number of disease-associated genesthat have been surveyed thus far, literally hundreds or even thousandsof different mutations have been identified and catalogued.Consequently, sequence determination represents the ultimate level ofresolution and may be the preferred method to monitor which mutation orcombination of mutations, out of a large number of mutations of knownclinical relevance, is present.

It would appear that the field of biomedical genetics will rely heavilyon sequencing technology. Hence, there is a need for advanced sequencingmethods that are time- and cost-competitive, and at the same timeaccurate and robust. Recent developments in this area includeimprovements to the basic dideoxy chain termination sequencing method[Sanger et al. Proc. Natl. Acad. Sci. USA 74: 5463–5467 (1977); reviewedby Lipshutz R. and Fodor S. et al., Current Opinion in StructuralBiology 4: 376–380 (1994)], as well as new approaches that are based onentirely new paradigms. Two such novel approaches aresequencing-by-hybridization (SBH) [Drmanac R. et al., Science 260:1649–1652 (1993)] and pyro-sequencing [Ronaghi M. et al., Science 281:363–365 (1998); Ronaghi M. et al., Anal. Biochem. 242: 84–89 (1996)].While the concepts of these approaches have been experimentallyvalidated, their ultimate acceptance and usage may depend on the type ofapplication —e.g. de novo sequencing, re-sequencing, and genotyping ofknown SNPs.

Recently, progress has also been made in the use of mass spectroscopy(MS) to analyze nucleic acids [Crain, P. F. and McCloskey, J. A.,Current Opinion in Biotechnology 9: 25–34 (1998), and references citedtherein]. One promising development has been the application of MS tothe sequence determination of DNA and RNA oligonucleotides [Limbach P.,Mass Spectrom. Rev. 15: 297–336 (1996); Murray K., J. Mass Spectrom. 31:1203–1215 (1996)]. MS and more particularly, matrix-assisted laserdesorption/ionization MS (MALDI MS) has the potential of very highthroughput due to high-speed signal acquisition and automated analysisoff solid surfaces. It has been pointed out that MS, in addition tosaving time, measures an intrinsic property of the molecules, andtherefore yields a significantly more informative signal [Koster H. etal., Nature Biotechnol., 14: 1123–1128 (1996)].

Sequence information can be derived directly from gas-phasefragmentation [see for example Nordhoff E. et al., J. Mass Spectrom.,30:99–112 (1995); Little D. et al., J. A. Chem. Soc., 116: 4893–4897(1994); Wang B. et al., WO 98/03684 and WO 98/40520; Blocker H. et al.,EP 0 103 677; Foote S. et al., WO 98/54571]. In contrast, indirectmethods measure the mass of fragments obtained by a variety of methodsin the solution phase, i.e., prior to the generation of gas phase ions.In its simplest form, mass analysis replaces the gel-electrophoreticfractionation of the fragment-ladder (i.e., a nested set of fragmentsthat share one common endpoint) generated by the sequencing reactions.The sequencing reactions need not necessarily be base-specific becausethe base-calling may also be based on accurate mass measurement offragments that terminate at successive positions and that differ fromone another by one nucleotide residue. The fragment-ladder can begenerated by the Sanger method [Köster H. et al., Nature Biotechnol.,14: 1123–1128 (1996); Reeve M. A., Howe R. P., Schwarz T., U.S. Pat. No.5,849,542; Köster H., U.S. Pat. No. 5,547,835; Levis R. and Romano L.,U.S. Pat. No. 5,210,412 and U.S. Pat. No. 5,580,733; Chait B. and BeavisR., U.S. Pat. No. 5,453,247], by base-specific partial RNA digestion[Hahner S. et al., Nucleic Acids Res., 25: 1957–1964 (1997); Köster H.,WO 98/20166] or by chemical cleavage [Isola N. et al., Anal. Chem., 71:2266–2269 (1999); references cited in Limbach P., Mass Spectrom. Rev.,15: 297–336 (1996)]. An alternative method consists of analyzing theladder generated by exonuclease digestion from either the 3′- or 5′-end[Pieles U. et al., Nucleic Acids Res., 21: 3191–3196 (1993); Köster H.,U.S. Pat. No. 5,851,765; Engels J. et al., WO 98/45700; Tarr G. andPatterson D., WO 96/36986; Patterson D., U.S. Pat. No. 5,869,240].

A severe limitation of both the direct and indirect MS methodologiesunder the current performance conditions is the poor applicability tochain lengths beyond ˜30–50 nucleotides. As a consequence, it has beensuggested that the prospects for MS lie with DNA diagnostic assays,rather than large-scale sequencing [Smith L., Nature Biotechnol., 14:1084–1087 (1996)]. Given the fact that MS represents an exquisite meansto analyze short nucleotide fragments, the various MS-based processesthat have been described for nucleic acid based diagnostic purposesgenerally involve the derivation and analysis of such relatively shortfragments [see for example Koster H., WO 96/29431; Koster H. et al., WO98/20166; Shaler T. et al., WO 98/12355; Kamb A., U.S. Pat. No.5,869,242; Monforte J. et al., WO 97/33000; Foote S. et al., WO98/54571].

Some of the MS-based assays have been used for the scoring of definedmutations or polymorphisms. Other processes derive multipleoligonucleotide fragments and yield a ‘mass-fingerprint’ so as toanalyze a larger target nucleic acid region for mutations and/orpolymorphisms. The latter MS analyses are however considerably lessinformative in that they are essentially restricted to the detection ofsequence variations. The methods cannot be applied to diagnosticsequencing of nucleic acids, where the term diagnostic sequencing meansthe unequivocal determination of the presence, the nature and theposition of sequence variations. At best, the measurements confirm thebase composition of small fragments whose masses are determined withsufficient accuracy to reduce the number of possible compositionalisomers. Also, it will be realized that only certain changes incomposition (as revealed by shifts in the mass spectrum) can beunambiguously assigned to a polymorphism or mutation. A match betweenthe spectrum of the interrogated sequence and a reference-spectrumobtained from wild-type sequence or sequences known to contain a givenpolymorphism, is assumed to indicate that the interrogated nucleic acidregion is wild-type or incorporates the previously known polymorphisms,thereby disregarding certain other possible interpretations.

While most methods in the art do yield sequence-related information,they do not disclose that a combination of several different massspectra, obtained after complementary digestion reactions, allows forthe effective survey of a nucleic acid region and provides anunambiguous assignment of both known as well as previously unknownsequence variations that occur relative to a reference nucleic acid witha known nucleotide sequence.

In view of the limitations of the methods described above, the art wouldclearly benefit from a new procedure for the diagnostic sequencing ofnucleic acids that would overcome the shortcomings of the processesdiscussed above.

In comparison with conventional sequencing technology, i.e., thegel-electrophoretic analysis of fragment ladders, the methods of thepresent invention are more suited for the simultaneous analysis ofmultiple target sequences. In general, each particular sequence orsequence variant is associated with a distinct set of mass peaks.Consequently, the sequencing reactions according to the methods of thepresent invention lend themselves readily to (i) multiplexing (i.e., theanalysis of two or more target non-contiguous target regions from asingle biological sample), (ii) the analysis of heterozygous samples, aswell as (iii) pooling strategies (i.e., the simultaneous sequencing ofthe analogous regions derived from two or more different biologicalsamples).

Because of the multiplex capacity, the present methods can be adapted asa tool for the genome-wide discovery and scoring of polymorphisms (e.g.,SNPs) useful as markers in genetic linkage studies. The unambiguousidentification/diagnosing of a number of variant positions is lessdemanding than full sequencing and, consequently, a considerable numberof target genomic loci can be combined and analyzed at the same time,especially when their lengths are kept relatively small. The number ofmarkers that can be scored in parallel will depend on the level ofgenetic diversity in the species of interest and on the precise methodused to prepare and analyze the target nucleic acids, but may typicallybe in the order of a few tens to up to 100 with current MS capabilities.The addition of multiplexing to the high-precision and high-speedcharacteristics of MS constitutes a new marker technology that enablesthe large-scale and cost-effective scoring of several (tens of)thousands of markers. Some aspects of the application of the presentmethods to genome-wide genotyping are described in Example 5.

Sequencing reactions according to the methods of the present inventionyield, in principle, a discrete set of fragments for each individualsequence or sequence variant whereas conventional sequence ladders stackon top of one another. Therefore, such sequences or sequence variantscan be analyzed even when present as a lesser species. This is a usefulquality for the analysis of clinical samples which are often geneticallyheterogeneous because of the presence of both normal and diseased cellsor in itself (e.g., cancerous tissue, viral quasi-species).Additionally, the ability to detect mutations at a low ratio of mutantover wild-type allele makes it practicable to pool individual biologicalsamples, a strategy which should permit a more cost-effective search forgenomic sequence variations in a population.

The present invention rests in part on the insight that integration ofthe data obtained in a set of complementary fingerprints produced by anappropriate set of complementary cleavage reactions of the inventionrepresents a level of characterization of a sample nucleic acidessentially equal to sequence determination. The present invention isalso directed to the use of cleavage protocols that result in thegeneration of cleavage products that range from mono- and dinucleotidesto fragments of a few tens of nucleotides that are particularly suitedfor analysis by MS. At the same time, the present method is distinctfrom the other fragmentation processes that are limited to screeningtarget nucleic acids for a wide range of potential mutations. Accordingto the present invention, a combination of several different massspectra, obtained after complementary digestion reactions, coupled withsystematic computational analysis allows the survey of a selected targetnucleic acid or region thereof and leads to the unambiguous assignmentof both known and previously unknown sequence variations. In certainaspects of the present invention, knowledge of the reference sequence incombination with the methods disclosed herein allows modeling of theexperimental approach, anticipation of potential ambiguities, and thedesign of an adequate resolution.

SUMMARY OF INVENTION

The present invention is directed to a mass spectroscopic method fordetecting or analyzing a particular nucleic acid sequence. The presentinvention is useful for de novo sequencing or re-sequencing nucleic acidin a rapid and reliable way which permits, for example, theidentification of the various allelic sequences of a certainregion/gene, the identification and scoring of disease-associatedmutations, the detection of somatic variations, determining geneticdiversity in molecular evolution, and the determination of the genomicsequences e.g., of viral and bacterial isolates. The present inventionis also useful for identification of all nucleic acid molecules in oneor more biological samples including for expression profiling i.e.,identification of all the mRNA species that are expressed in a givencell at a given time, by rapidly determining the sequence of the mRNAthat is expressed.

In one embodiment, the present invention is directed to methods forsequence analysis of one or more target nucleic acids for which a knownreference nucleic acid sequence is available. In this method, one ormore target nucleic acids are derived from one or more biologicalsamples, and a reference nucleic acid are each subjected tocomplementary cleavage reactions, and the products of the cleavagereactions are analyzed by mass spectroscopic methods. The mass spectraof the one or more target nucleic acids are then compared with the massspectra of the reference nucleic acid sequence, and the nucleotidesequence of the one or more target nucleic acids is deduced bysystematic computational analysis.

In one aspect of this embodiment, multiple targets, such as cDNA clones,are prepared from the mRNA of the same biological sample, and areseparately analyzed as above in parallel experiments. In a secondaspect, multiple targets are derived from the same biological sample andare analyzed simultaneously, for example in genome-wide genotyping.

The one or more target nucleic acids may be selected from the groupconsisting of a single stranded DNA, a double stranded DNA, a cDNA, asingle stranded RNA, a double stranded RNA, a DNA/RNA hybrid, and aDNA/RNA mosaic nucleic acid.

In a second embodiment, the one or more target nucleic acids areselected from the group consisting of an amplified nucleic acidfragment, a cloned nucleic acid fragment, and a series of non-contiguousDNA fragments from the genome. In one aspect of this invention, theamplified one or more target nucleic acids are derived by one or moreconsecutive amplification procedures selected from the group consistingof in vivo cloning, the polymerase chain reaction (PCR), reversetranscription followed by the polymerase chain reaction (RT-PCR), stranddisplacement amplification (SDA), and transcription based processes.

In a preferred embodiment, the amplified nucleic acid fragments are RNAtranscripts generated from one or more target nucleic acids or areference nucleic acid by a process comprising the steps of: (a)amplifying the one or more target nucleic acids or the reference nucleicacid using one or more primers corresponding to a region that iscomplementary to the one or more target nucleic acids or the referencenucleic acid and encoding an expression control sequence using any oneof the amplification procedures described above, and (b) generating RNAtranscripts from the amplified one or more target nucleic acids orreference nucleic acid using one or more RNA polymerases that recognizethe transcription control sequence on the target or reference nucleicacid. The RNA generated by the above process is then subjected tocomplementary cleavage reactions to generate nucleic acid fragments,which are then analyzed by MS. The transcription control sequence may beselected from the group consisting of an eukaryotic transcriptioncontrol sequence, a prokaryotic transcription control sequence, and aviral transcription control sequence. The prokaryotic transcriptioncontrol sequence may be selected from the group consisting of T3, T7,and SP6 promoters. The cognate RNA polymerases may be either a wild-typeor a mutant form capable of incorporating non-canonical substrates witha 2′-substituent other than a hydroxyl group.

In a third embodiment, the one or more target nucleic acids areamplified using modified nucleoside triphosphates. The mass modifiednucleoside triphosphates may be selected from the group consisting of amass modified deoxynucleoside triphosphate, a mass modifieddideoxynucleoside triphosphate, and a mass modified ribonucleosidetriphosphate. The mass modified nucleoside triphosphate may be modifiedon the base, the sugar, and/or the phosphate moiety, and are introducedthrough an enzymatic step, chemically, or a combination of both. In oneaspect the modification may consist of 2′-substituents other than ahydroxyl group on transcript subunits. In another aspect, themodification may consist of phosphorothioate internucleoside linkages orphosphorothioate internucleoside linkages further reacted with analkylating reagent. In yet another aspect, the modification may consistof a methyl group on C5 of the uridine-5′-monophosphate subunits. Suchmodifications may alter the specificity of cleavage by certain reagents,and/or the mass of the cleavage products, and/or the length of thecleavage products.

In one aspect of the invention, the one or more target nucleic acids andreference nucleic acid are subjected to complementary cleavage reactionsusing enzymatic cleavage, chemical cleavage, and/or physical cleavagereactions. In a preferred embodiment, the one or more target nucleicacids and the reference nucleic acid are subjected to enzymatic cleavagereaction using one or more enzymes selected from the group consisting ofendonucleases and exonucleases. In a more preferred embodiment, thetarget nucleic acid is a double-stranded RNA and the endonuclease usedis a ribonuclease. The ribonuclease may be selected the G-specific T.ribonuclease, the A-specific U₂ ribonuclease, the A/U specific phyMribonuclease, the U/C specific ribonuclease A, the C-specific chickenliver ribonuclease (RNaseCL3), and cusativin. In one aspect of thispreferred embodiment, the target nucleic acid is aphosphorothioate-modified single-stranded DNA or RNA and theendonuclease is nuclease P1.

In another aspect, the mass spectroscopical analysis of the nucleic acidfragments is performed using a mass spectrometer selected from the groupconsisting of Matrix-Assisted Laser Desorption/Ionization-Time-of-flight(MALDI-TOF), Electrospray-Ionization (ESI), and Fourier Transform-IonCyclotron Resonance (FT-ICR). In a preferred embodiment the massspectrometer used for the analysis of the cleavage fragments isMALDI-TOF.

In a fifth embodiment, the method of the present invention can be usedfor diagnosing nucleic acid sequence variations in one or more targetnucleic acids derived from a biological sample, for which a knownreference nucleic acid sequence is available. In this method, one ormore target nucleic acids derived from a biological sample, and areference nucleic acid whose sequence has been predetermined aresubjected to complementary cleavage reactions, and the products of thecleavage reactions are analyzed by mass spectroscopic methods. The massspectra of the one or more target nucleic acids is then compared withthe mass spectra of the reference nucleic acid, and the nucleotidesequence variations in the one or more target nucleic acids is thendeduced by systematic computational analysis of the sequence variationsbetween the one or more target nucleic acids and the reference nucleicacid. A variety of acid sequence variations including deletions,substitutions and/or insertions in a target nucleic acid can bedetermined using the method of the present invention.

In a sixth embodiment, the method of the present invention can be usedfor scoring known nucleotide sequence variations in one or more targetnucleic acids derived from a biological sample, for which a knownreference nucleic acid sequence is available. In this embodiment, one ormore target nucleic acids derived from a biological sample, and areference nucleic acid are subjected to complementary cleavagereactions, and the products of the cleavage reactions are analyzed bymass spectroscopic methods. The mass spectra of the one or more targetnucleic acid is then compared with the mass spectra of the referencenucleic acid sequence, and the nucleotide sequence variations/mutationsin the one or more target nucleic acids are scored by comparing thenucleic sequence between the one or more target nucleic acid andreference nucleic acid by systematic computational analysis.

In a seventh embodiment, the method of the present invention can be usedfor determining the nucleotide sequence (de novo sequencing) of one ormore target nucleic acids derived from a biological sample for which noreference sequence is available. In this method, target nucleic acid,derived from a biological sample is subjected to complementary cleavagereactions, and the products of the cleavage reactions are analyzed bymass spectroscopic methods. The mass spectra of the one or more targetnucleic acids coupled with a systematic computational analysis is thenused to deduce the sequence of the one or more target nucleic acids.

In an eighth embodiment, the method of the present invention can be usedfor genome-wide genotyping of one or more known or unknown targetnucleic acids. In this method, one or more target nucleic acids, derivedfrom a biological sample, are amplified and then subjected tocomplementary cleavage reactions. In one aspect, multiple targets arederived from a single sample and are analyzed simultaneously. Theproducts of the cleavage reactions are then analyzed by massspectroscopic methods. The mass spectra of the one or more known orunknown target nucleic acid is compared with the mass spectra of areference nucleic acid. This comparison is then used to infer thegenotype of an organism from which the biological sample is derived andto determine therefrom the genetically relevant nucleic acid sequencevariations of the one or more known or unknown nucleic acids.

In a ninth embodiment, the method of the present invention can be usedto identify one or more target nucleic acids in one or more biologicalsamples. In this method, one or more target nucleic acids, derived froma biological sample, are amplified and then subjected to complementarycleavage reactions. In one aspect, multiple targets are derived from asingle sample and are analyzed simultaneously. The products of thecleavage reactions are then analyzed by mass spectroscopic methods. Theidentity of one or more target nucleic acids is deduced by comparing themass spectra of the one or more known or unknown target nucleic acidwith each other or by comparison with a plurality of mass spectra ofreference nucleic acids.

In one aspect, the method of the present invention can be used forexpression profiling, i.e. identifying the various mRNA expressed in oneor more biological samples.

Also encompassed by the present invention is a kit for sequence analysisof one or more target nucleic acids using mass spectroscopy, the kitcomprising a container having one or more sets of reference nucleicacids for which the nucleotide sequence is known, one or more nucleicacid cleaving agents, and computer algorithm/software for comparing themass spectra of the one or more target nucleic acids with the massspectra of the reference nucleic acid and deducing therefrom the nucleicacid sequence of the one or more target nucleic acids. In oneembodiment, the nucleic acid cleaving agent in the kit is a chemicalagent. In an alternate embodiment, the nucleic acid cleaving agent is anenzyme selected from a group of enzymes consisting of endonucleases andexonucleases. In a preferred embodiment, the endonuclease is aribonuclease selected from the group consisting of the G-specific T.ribonuclease, the A-specific U₂ ribonuclease, the A/U specific phyMribonuclease, the U/C specific ribonuclease A, the C-specific chickenliver ribonuclease (RNaseCL3), and cusativin.

DESCRIPTION OF DRAWINGS

FIG. 1A (SEQ ID NO: 1) graphically represents the first 120 nucleotidesof exon 5 of human p53 as well as the fragments that would result fromcleavage of the (+) and (−) strand transcript after G (RNase-T1) or A(RNase-U2). The dotted and full arrows correspond to the resulting≦3-mer and ≧4-mer cleavage products. The arrows from left to rightrepresent fragments from the (+) strand, while the arrows from right toleft represent fragments from the (−) strand. The numbers indicate theneutral molecular masses of the ≧4-mer ribonucleotide fragments. Thecalculation assumes that all fragments contain 5′-hydroxyl and3′-phosphate groups.

FIG. 1B shows the size distribution of the products that result frombase-specific cleavage of a 245 nucleotides long exemplary sequence.

FIG. 2 summarizes the results of the mutational simulation analysis of a200-base-pair segment of the HIV protease gene and shows the percentagesof the mutational changes that can be detected (hatched bars) and mapped(filled bars). The results were computed for single RNase digests of the(+) and (−) strands with respectively RNase-T1 (T1) and RNase-U2 (U2),separately or combined (T1/U2). All refers to the analysis with the fourdifferent reactions.

FIG. 3 shows the distributions of the number of diagnostic fragmentsobtained for the mutational simulation analysis of 1.200 base-pairsequence of HIV when using different length segments of respectively100, 200, 300, and 600 base-pairs.

FIG. 4 summarizes the results of the mutational simulation analysis of1.200 base-pair sequence of HIV and shows the percentages of the singlenucleotide substitutions that can be detected (hatched bars) and mappedunambiguously (filled bars) as a function of the length of theinterrogated segments.

FIG. 5 (SEQ ID NO: 2 and SEQ ID NO: 3) is a graphic representation ofthe pGEM3-Zf(+) derived nucleotide sequences used as a model in Examples2 and 4. The regions corresponding to the PCR primers are underlined.Two PCR products (158 and 1012 base-pairs long) were generated. Bothamplification products encompass the phage T7 promoter site; thetranscription initiation site is indicated with an arrow. The numberingrefers to the respective transcripts (118 and 972 nucleotides).

FIG. 6 is a graphical representation of the MALDI-TOF mass spectra ofthe RNase-A cleavage reactions of pGEM3-Zf(+) derived transcripts. Thefollowing transcripts were digested: (A) a regular transcriptsynthesized with rNTPs, (B) a transcript in which UMP residues arereplaced by dTMP, (C) a transcript where UMP is replaced by dUMP, and(D) one that incorporates dCMP instead of CMP. Observed masses areindicated above the peaks that match with predicted digestion products(see Table II).

FIG. 7A (SEQ ID NO: 4 and SEQ ID NO: 5) is a graphical representation ofPCR products and transcripts used for diagnostic sequencing of theRNase-T1 coding region. Two parallel amplification reactions wereperformed with either the upstream or downstream primer tagged to the T7promoter. The amplification products allow the transcription of the (+;upper sequence) or (−; lower sequence) strand. The underlined regionshows the appended T7 promoter site. An arrow indicates thetranscription initiation site.

FIG. 7B (SEQ ID NO: 6 through SEQ ID NO: 14) shows the position andnature of a number of single, double, and triple mutations in RNase-T1(reference denotes the wild-type coding region).

FIG. 8 is a graphical representation of the MALDI-TOF mass spectraobtained for RNase-T1 analysis. Four transcripts were digested withRNase-A: (A) dU-incorporating transcript of the (+) strand, (B)dC-transcript of the (+) strand, (C) dU-transcript of the (−) strand,(D) dC-transcript of the (−) strand. The observed masses of predictedpeaks are indicated. Presumed double protonated peaks are labeled M²⁺with the mass of the parental [M+H]+peak indicated between parentheses(FIG. 8B). One of the peaks in FIG. 8D (1207.1+G) is best explained byassuming the addition of an extra G-residue at the transcript 3′-end.FIG. 8C only shows the 900–4800 Da mass range; the digestion product of11124 Da was not detected.

FIG. 9 (panels A, B, and C) is a graphical representation of theMALDI-TOF mass spectra of the RNase-A cleavage reaction of a pGEM3-Zf(+)derived T7-transcript of 972 nucleotides long. The transcriptincorporates dCMP instead of CMP residues. The observed masses of thepredicted peaks is indicated. An asterisk indicates 2′,3′-cyclicphosphate reaction intermediates (see Table V).

DETAILED DESCRIPTION OF INVENTION

With current capabilities in mass spectroscopy, it is impractical tosequence nucleic acids greater than ˜50 bases in length. Consequently,an impractical and cumbersome number of independent sequencing reactionsis necessary to cover the thousands of bases of a gene or other geneticregion of interest. The methods of the present invention described belowovercome this limitation. At the same time, the present method isdistinct from the other fragmentation processes that are limited toscreening target nucleic acids for a wide range of potential mutations.Indeed, the appropriate choice of complementary cleavage reactions asdescribed herein allows the determination of the exact location andnature of a genetic variation. Also, it is demonstrated herein thatcomputational protocols are an integral part of the described method.The methods and algorithms are required to deduce, on the basis of thereference sequence(s), the relation between (i) the spectral changesassociated with one or more cleavage reactions of a given nature, and(ii) the uniquely defined sequence variations.

Sequencing reactions according to the methods of the present inventioncan be multiplexed, i.e. used for the simultaneous analysis of multiplenon-contiguous target regions [supra]. Therefore, the methods can beadapted as a tool for the genome-wide discovery and/or scoring ofpolymorphisms (e.g. SNPs) useful as markers in genetic linkage studies.Indeed, it will be recognized that the unambiguousidentification/diagnosing of a number of variant positions is lessdemanding than full sequencing and that consequently a considerablenumber of target genomic loci can be combined and analyzed in parallel,especially when their lengths are kept relatively small. The number ofmarkers that can be scored in parallel will depend on the level ofgenetic diversity in the species of interest and on the precise methodused to prepare and analyze the target nucleic acids, but may typicallybe in the order of a few tens to up to 100 or more with current MScapabilities. The addition of multiplexing to the high-precision andhigh-speed characteristics of MS constitutes a new marker technologythat enables the large-scale and cost-effective scoring of several (tensof) thousands of markers. Some aspects of the application of the presentmethods to genome-wide genotyping are described in Example 5.

The present invention provides a mass spectroscopy (MS) based nucleicacid sequencing method that overcomes some of the drawbacks inherent inthe prior art. In contrast to the previously described methods, themethods of the present invention do not require the generation of aladder, i.e. an ordered set of nested nucleic acid fragmentscharacterized by a common end. Rather, the disclosed methods rely on acombination of complementary fragmentation reactions and the analyticalresolution power of MS to improve mass resolution and mass accuracy. Thepresent invention is directed to the use of enzymatic cleavage protocolsthat result in the generation of cleavage products that range from mono-and dinucleotides to fragments of a few tens of nucleotides. that areparticularly suited for analysis by MS. According to the presentinvention, a combination of several different mass spectra, obtainedafter complementary digestion reactions, coupled with systematiccomputational analysis allows the survey of a selected nucleic acid orregion thereof and leads to the unambiguous assignment of both known andpreviously unknown sequence variations.

The present invention is also directed to methods for the diagnosticsequencing (also referred to as re-sequencing) of all or part of asample nucleic acid, i.e. the determination of the presence, the natureand the location of the sequence variations that occur relative to arelated known reference sequence. The sequence variations may either bepreviously identified or hitherto unknown. Diagnostic sequencingaccording to the present invention may focus on particular positions ina nucleic acid sequence, e.g. when scoring previously known mutations orpolymorphisms.

The term “mapping”, as used herein, will be understood to include boththe characterization, i.e. determination of the nature, and the positionof the sequence variations.

The terms “target DNA”, “target sequence”, “target nucleic acid” and thelike, as used herein, refer to the sequence region which is to besequenced or re-sequenced entirely or in part as well as to the nucleicacid material that is actually subjected to one or more complementarycleavage reactions.

The terms “reference nucleic acid sequence”, “related sequence”,“previously known sequence”, and the like, refer to a nucleic acidregion, the sequence of which has previously been determined whichcorresponds to the target. The reference and target sequences may befound to be identical or may differ. The reference sequence need notderive from the same species. In many applications, several differentsequence variants will be available as reference. The differencesbetween a target sequence and its reference sequence may be simple(e.g., single nucleotide substitutions, deletions and insertions;microsatellite polymorphisms) or complex (e.g., substitution, insertion,and deletion of multiple nucleotides). In certain situations, one maynot know in advance to what reference sequence, if any, the targetnucleic acid corresponds. In such situations the interrogated targetsequence typically corresponds to a portion of a (much) larger referencesequence and/or to one out of a plurality of different references.

The terms “unambiguous”, “unique”, “unequivocal”, and the like, are usedto indicate that only a single sequence variation or combination ofsequence variations can explain the observed mass spectral changes.

The terms “complementary (cleavage) reactions”, “complementarycleavages” and the like, as used herein, refer to target nucleic aciddigestions characterized by varying specificity [e.g., stringent orrelaxed mono- and di-nucleotide specificity; digestion with acombination of reagents; partial cleavage] and/or to digestionalternative forms of the target sequence [e.g., the complementary (+)and (−) strands; incorporation of modified subunits; analysis ofvariable portions of the target sequence].

The terms “transcript” and “transcription”, as used herein, refer to thesynthesis of a nucleic acid polymer by means of an RNA polymerase. Inaddition to canonical subunits (having a 2′-OH group), a transcript mayincorporate non-canonical substrates (having any other substituent thana hydroxyl group at the 2′-position). Canonical and non-canonicalsubstrates may contain additional modifications.

The term “genotyping,” as used herein, refers to determining the geneticconstitution, which is the particular set of alleles inherited by theorganism as a whole, or the type of allele found at a particular locusof interest.

The term “expression profiling,” as used herein, refers to method(s) fordetermining the mRNA expression profile of a given cell or a populationof cells at a given time under a given set of conditions.

Nucleotides are designated as follows. A ribonucleoside triphosphate isreferred to as NTP or rNTP; N can be A, G, C, U or m⁵U to denotespecific ribonucleotides. Likewise, deoxynucleoside triphosphatesubstrates are indicated as dNTPs, where N can be A, G, C, T, or U.Throughout the text, monomeric nucleotide subunits are denoted as A, G,C, or T with no particular reference to DNA or RNA. When necessary, thenature of the nucleoside monophosphates is clarified by the use of morespecific abbreviations such as U, m⁵U, CMP, and UMP to refer toribonucleotides and dC, dU, dCMP, dUMP and dTMP to indicatedeoxynucleotides. Note that T is not an alternative designation for m⁵U.

Sequencing via Non-Ordered Sets of Specific Cleavage Fragments

The methods of the present invention allow the interrogation everyposition in a given target sequence without creating a fragment-ladder,i.e. a nested set of fragments that share one common endpoint. Themethod comprises, in part, subjecting one or more target nucleic acidsto a set of complementary mononucleotide- and/or dinucleotide-specificcleavages, the products of which are analyzed by mass spectroscopy (MS).A preferred method according to the invention includes the specificcleavage of the one or more target nucleic acids at each nucleotide byway of two or more separate reactions. The digestion products obtainedin mononucleotide- and dinucleotide-specific cleavage reactions such asthose described herein range from mononucleotides to fragments of a fewtens of nucleotides and are particularly well suited for analysis by MS.This aspect of the invention overcomes the technical limitation of theshort read lengths encountered when analyzing fragment-ladders under thecurrent MS performance. The mass spectra obtained with the methods donot provide a simple readout of the sequence. Computational approachesprovided herein allow the comparative analysis of the obtained spectrawith those known or predicted for the related reference sequence.

The ability to detect and map sequence variants based on the non-orderedset of cleavage fragments according to the present invention resides inpart in the combination of the various complementary site-specificreactions. For example, one cleavage scheme useful in the practice ofthe present invention makes use of the mononucleotide-specificribonuclease-T1 (RNase-T1, G-specific) and RNase-U2 (A-specific; thelimited specificity of this enzyme is recognized and will be dealt withbelow). Both purines (A/G) and pyrimidines (C/T) in a target nucleicacid can be examined by cleaving an RNA copy of the two complementarystrands of a target nucleic acid with both enzymes. MS analysis of thefragments generated by only a single mononucleotide-specific reactionwould detect the presence of most sequence variations but only aminority of the mutations—in essence those affecting the nucleotide thatis recognized—would also be localized. Since the methods of the presentinvention examine each of the four bases in a given sequence, each ofthe twelve possible nucleotide substitutions result in the loss of onecleavage site and the concomitant gain of another cleavage site. Thisprinciple is illustrated in Table I for the RNase-T1 and RNase-U2cleavage reactions on the two complementary transcripts of ahypothetical target nucleic acid. Transitions affect both the RNase-T 1and RNase-U2 cleavage patterns of either the (+) or the (−) strand. Ascan be seen in Table 1, all transversions change the cleavage pattern ofboth strands of the transcript: they affect either one of the RNasedigests on both strands, or the T1 digest of one strand and the U2digest of the complementary strand. In addition to altering two cleavagepatterns, each single nucleotide substitution also affects the molecularmass of one fragment in each of the remaining two digestion reactions(Table I). In conclusion, complementary cleavage reactions of thepresent invention results in a high degree of built-in redundancy. Eachnucleotide substitution is potentially associated with a maximum of tendifferences (data points) with respect to the reference spectrum. Theloss and gain of a cleavage site are associated with both thedisappearance and appearance of three peaks; two additional peaksundergo a shift as a result of a mass difference. In practice, the 1 Damass difference between C and U(T) may result in the loss of asignificant amount of information (Table I). More particularly, in G-and A-specific cleavage reactions, the C/U transitions may go unnoticedwhile the observed mass difference may not be unambiguously assigned toa certain transversion. However, in preferred methods of the presentinvention directed to the analysis of RNA target sequences the methodmakes use of C and/or U analogs that exhibit more favorable massdifferences, thus allowing the unambiguous assignment of the massdifference to a particular transversion. Example 1 and Table Iillustrate that 5-methyluridine is an example of such a useful analog[m⁵U; R. I. Chemical, Orange, Calif.; see also to Hacia J. et al.,Nucleic Acids Res. 26: 4975–4982 (1998) for the incorporation of m⁵UTPduring in vitro transcription reactions].

FIG. 1A shows, by way of example, a 120-nucleotide segment of exon 5 ofthe p53 gene as well as a graphical representation of the digestionproducts generated by RNase-T1 and RNase-U2 on an RNA copy of eachstrand. FIG. 1B displays the size distribution of the base-specificdigestion fragments derived from another exemplary sequence andillustrates that mono-, di- and tri-nucleotides are considerably morenumerous than the larger digestion products. This distribution isexpected for mononucleotide specific cleavage reactions that generatefragments with an average length of four nucleotides. Contrary to thesize distribution, the number of different molecular masses thatoligonucleotides can assume rapidly increases with the size of thefragment. Because of the constrained composition of digestion products(e.g. only one G in the case of RNase-T1), the number of molecularmasses of mono-, di- and tri-nucleotides is limited to 1, 3 and 6,respectively. Consequently, mono-, di- and tri-nucleotides are oftennon-informative in the methods of the present invention because theirnumber exceeds the limited mass space. FIG. 1A illustrates that incertain parts of the target sequence one of the cleavage reactionsproduces many small fragments due to an over-representation of therecognized nucleotide and, consequently, yields virtually noinformation. However using the method of the present invention, thisproblem is minimized by the complementary nature of the four reactionswhich ensures that the fragments derived from the same region by theother digestions (interrogating under-represented nucleotides) arecorrespondingly larger. This indicates a basic attribute of the methodsof the present invention. Each of the four cleavage reactions yieldsinformation about a particular mutational alteration (see Table I) and,in general, the redundancy in this information enables theidentification of the mutation (nature and location) even when part ofthe information is missing from the spectra as described above.

The methods of the present invention are therefore largely, yet notcompletely, sequence-independent and permits the re-sequencing ofvirtually any variation. Computer simulations of diagnostic sequencingby the present methods, more particularly those involving digestion ofRNA copies of each strand with the RNases T1 and U2, have shown that fortarget sequences of up to three hundred base-pairs ˜90% or more of allpossible single nucleotide substitutions are associated with ≧4 datapoints. Fewer than 1% of the substitutions do not result in spectralchanges. More than 95% of all possible single nucleotide substitutionsgive rise to unique spectral changes and can therefore be unambiguouslyidentified (see Example 1 and FIGS. 3 and 4).

In summary, deduction of the sequence according to the methods of thepresent invention is based on the integration of the information thatresides in a complementary set of ‘mass-fingerprints’ as well as theprevious knowledge about a related reference sequence. The relationshipbetween this multitude of data allows inferring the presence, nature andposition of sequence variations in an unambiguous way. It isillustrative of the method that the derivation of the sequence is notcritically dependent on the accuracy, i.e., the absolute values of themass measurements. It is rather the coherent ensemble of mass-shifts andappearances/disappearances of cleavage sites that uniquely define thesequence. The computer simulations, described herein, assumed aresolution of 5 Da or 0.1%, a figure which is well above what can beachieved with state-of-the-art equipment. Also, it should be pointed outthat the determination of the correct base composition is limited anywayto short fragments, even in the case of high-precision measurements[e.g., 5-mers in the case of unrestrained sequences and if themeasurement has an accuracy of 0.01% or better; Limbach P., MassSpectrom. Rev. 15: 297–336 (1996)]. Other methods in the art, whichinvolve the accurate mass determination to assign the correct basecomposition to one or more fragments, will generally permit thedetection of most sequence variations but not their unequivocal mapping.In these experiments it is generally assumed that a certain experimentalobservation relates to one particular previously known sequencevariation, ignoring the fact that alternative sequence variations canexplain the same result.

The present invention encompasses several additional embodiments andaspects described hereinafter and certain other embodiments will bereadily apparent to one of ordinary skill in the art.

Target Nucleic Acid Preparation and Fragmentation

(a) Derivation of Target Nucleic Acid and Approaches to Cleaving withBase-Specificity

Nucleic acid molecules can be isolated from a particular biologicalsample using any of a number of procedures, which are well-known in theart, the particular isolation procedure chosen being appropriate for theparticular biological sample. To obtain an appropriate quantity ofisolated target nucleic acid on which to perform the methods of thepresent invention, amplification of the target nucleic acid may benecessary. Examples of appropriate amplification procedures for use inthe invention include but are not limited to: cloning [Sambrook et al.,Molecular Cloning: A Laboratory Manual, Cold Spring Harbor LaboratoryPress (1989)], polymerase chain reaction (PCR) [Newton C. R. and GrahamA., PCR, BIOS Publishers (1994)] and variations such as RT-PCR [Higuchiet al., Bio/Technology 11: 1026–1030 (1993)] and allele-specificamplification (ASA), strand displacement amplification (SDA) [TerranceWalker G. et al., Nucleic Acids Res. 22: 2670–77 (1994)], andtranscription based processes.

One embodiment of the present invention is directed to methods forsequencing (re-sequencing, etc.) Nucleic acid comprising the digestionof an RNA copy of each strand of the target nucleic acid with the RNasesT1 and U2. One of the advantages of the method is the use of RNA, whichexhibits higher sensitivity and better stability in MALDI-MS compared toDNA [Hahner S. et al., Nucleic Acids Res. 25: 1957–1964 (1997)].Typically, the first stage of this aspect of the invention involves theamplification of the target nucleic acid by PCR or reverse-transcriptionfollowed by PCR. (RT-PCR) This can be achieved with a pair of dedicatedprimers that incorporate promoter sequences as non-annealing5′-extensions. In a second stage, these promoters are used for thespecific transcription of the adjacent sequences including the targetsequences. Preferably, the promoter sequences are small and permit thein vitro transcription by a single subunit cognate RNA polymerase suchas those deriving from bacteriophage T7, T3 and SP6. Preferred for usein this aspect of the invention are C and/or U analogs that can beincorporated during transcription and that exhibit favorable massdifferences [e.g. m⁵U; supra]. The use of PCR primers that carrydifferent promoter sequences permits the generation of an RNA copy ofboth strands in two parallel strand-specific transcription reactions.Both strands may also be transcribed from the same promoter sequence:this requires two parallel amplification reactions with only onepromoter tagged primer. Alternatively, the in vitro transcripts may alsobe produced from sequences cloned in special purpose vectors such as thepGEM-type vectors available from Promega (Madison, Wis.) which containappropriate promoters. The third step further comprises the treatment ofthe resultant RNA transcripts with one or more complementarymononucleotide-specific RNases (e.g RNase-T1 and RNase-U2), such thateach desired position in the target sequence is interrogated. The finalstep in the process consists of the mass-spectrometric analysis of theRNA fragments resulting from the complementary cleavage reactions andthe comparison of the spectra obtained with those of the known referencesequence.

Alternative schemes to prepare target nucleic acid obtained from abiological sample and to subject the target sequence to a set ofcomplementary mononucleotide-specific cleavage reactions are also withinthe scope of the invention. The target nucleic acid can be DNA, cDNA,any type of RNA, DNA/RNA hybrid, or of mosaic RNA/DNA composition[depending on the ratio of ribo- and deoxyribonucleoside triphosphates(rNTP/dNTP) in the synthesis reaction; Sousa R. and Padilla R., EMBO J.14: 4609–4621 (1995); Conrad F. et al., Nucleic Acids Res. 23: 1845–1853(1995)]. The target sequence may also include modifications that areeither introduced during or after enzymatic synthesis.

In general, different forms of each target sequence will be prepared soas to be able to perform a complementary set of mono-specific cleavagereactions. The cleavage reactions may be performed enzymatically and/orchemically. The mononucleotide-specificity of the digestion reactionsmay reside in the cleaving agent (e.g RNase T1), in the structure of thetarget nucleic acid, or in a combination of both. For example, RNase A(specific for both C- and U-residues) can be made monospecific bymodifications of the substrate sequence that block the ribonucleolyticaction at C or U residues. RNase A cleavage at U residues can in theorybe prevented by chemical modification [Simoncsits A. et al., Nature 269:833–836 (1977)]. The enzymatic incorporation of nucleotide analogs, mostnotably those modified at the 2′-hydroxyl group of the ribose isparticularly preferred in the practice of the invention. A variety ofsuch analogs have been demonstrated to be substrates for T7 RNApolymerase; e.g. 2′-fluoro, 2′-amino [Aurup H. et al., Biochemistry 31:9636–9641 (1992)], 2′-O-methyl [Conrad F. et al., Nucleic Acids Res. 23:1845–1853 (1995)], as well as 2′-deoxy NTPs [Sousa R. and Padilla R.,EMBO J. 14: 4609–4621 (1995); Conrad F. et al., Nucleic Acids Res. 23:1845–1853 (1995)]. The above strategy may also be used to improve thespecificity of certain RNases such as RNase U2 which is said to cleaveGpN phosphodiester bonds in extensive digests [Brownlee G., in“Laboratory Techniques in Biochemistry and Molecular Biology” (Work T.S. and Work E., eds.), North-Holland, Amsterdam, pp 199–200 (1972)].Mosaic DNA/RNA target sequences that incorporate only one specific rNTPand that can be obtained quite efficiently with particular mutantpolymerases [Sousa R. and Padilla R., EMBO J. 14: 4609–4621 (1995); GaoG. et al., Proc. Natl. Acad. Sci. USA 94: 407–411 (1997); Bonnin A. etal., J. Mol. Biol. 290: 241–251 (1999)], may allow mono-specificcleavages by alkaline treatment or by digestion with a non-specificRNase such as RNase-I [Meador J. et al., Eur. J. Biochem. 187: 549–553(1990)].

Alternative strategies to obtain selective cleavage of target sequencesmake use of phosphorothioate chemistry. DNA and RNA polymers withphosphorothioate internucleoside linkages in the Rp stereo-configurationare readily synthesized [see Eckstein F., Ann. Rev. Biochem. 54: 367–402(1985) and references cited therein]. Such phosphorothioate linkages canbe specifically hydrolyzed following alkylation [Gish G. and EcksteinF., Nucleic Acids Symp. Ser. pp 253–256 (1987); Gish G. and Eckstein F.,Science 240: 1520–1522 (1988)]. Mono-nucleotide specific fragmentationaccording to this aspect of the invention would require the synthesis oftargets making use of one particular α-thio nucleotide triphosphatesubstrate. Some nucleases (e.g. nuclease P1) cannot hydrolyze Rpphosphorothioate diesters; indirect selective cleavage (at a naturalphosphodiester) may thus be obtained with target sequences thatincorporate three different αS-dNTPs (or αS-rNTPs).

(b) Alternative complementary reactions

The performance of the present sequencing methods will be understood bythose skilled in the art to be dependent on the following interrelatedfactors: (1) the length of the region to be sequenced, (2) theresolution of the MS analysis, and (3), to some extent, the sequenceitself. The longer the region of interest and, consequently, the largerthe number of digestion products, the more important the resolutionbecomes. Also, the length of the region to be sequenced is directlyproportional to the number of single nucleotide substitutions thatcannot be unambiguously mapped on the basis of the four base-specificfragmentation patterns only (Example 1; FIG. 4). Some sequence motifsare intrinsically difficult to sequence. An example of such a sequenceis CTAGC₁C₂C₃C₄C₅GATC (SEQ ID NO: 15), where mutations at C₁ and C₂cannot be discriminated from the same type of mutations at C₅ and C₄,respectively. Another such sequence is GAG₁A₂G₃A₄GA, where G₁->A cannotbe discriminated from the G₃->A mutation; similarly, A₂->G and A₄->Gcannot be distinguished. Finally, the four mono-nucleotide specificcleavages may also be insufficient to analyze complex sequencevariations (see discussion below). Most preferably, therefore, thepracticing of the present invention includes a computer-aided simulationof the re-sequencing strategy of the intended region. Such simulationand analysis will reveal possible problematic positions in the sequenceand can be used to assess the usefulness of certain additionalcomplementary cleavage reactions as countermeasures to overcome suchsequencing difficulties.

One such measure consists of dividing the target region and deriving twoor more (partially overlapping) segments (e.g., amplicons) from thesample nucleic acid rather than sequencing the target region as a whole.In addition to setting the length, this allows to exert some controlover the composition. This would abrogate problems arising when theregion of interest contains a duplicated segment. A second measureconsists of carrying out one or more alternative or additional reactionsinvolving target fragments that incorporate one or more modifiednucleotides that exhibit different molecular masses such as is describedabove. Those of skill in the art will know of the existence of a wealthof mass-modified nucleotide analogs, many of which are useful and can bereconciled with the enzymatic procedures of the present method. Thenucleotide analogs will differentially affect the masses of many of thedigestion products and will therefore yield a significantly differentspectrum that may reveal the required information. The analogs U and m⁵U[supra] exemplify this. Simulation studies (which model the presentinvention) have indicated that the use of U resolves certain sequenceambiguities observed with m⁵U (data not shown), while overall the latternucleotide analog results in considerably fewer sequence ambiguities(see Example 1).

Another option consists of performing one or more additional reactionson the complementary strand. Compared to, for example, a G-specificcleavage of one strand, the C-reaction of the complementary sequencewill yield a different set of fragments characterized by other massdifferences. The effect of including reactions on the complementarystrand of the target sequence is therefore similar to the use ofnucleotide analogs.

Still another alternative provided by the present invention and which isuseful in obviating the potential problems exemplified above includesusing reactions with alternative specificities of cleavage. For example,partial base-specific cleavage can be achieved by changing the reactionconditions or by use of a specially prepared target wherein thecleavable and uncleavable (e.g. 2′-modified; supra) forms of oneparticular nucleotide occur randomly. Alternatively, instead of partialbase-specific cleavages, one or more specific digestions characterizedby a greater stringency can be performed (e.g. dinucleotide- or relaxeddinucleotide-specificity; see below). The digestion of the targetsequence, in double stranded DNA form, with restriction enzymes is stillanother alternative provided by the present invention. Double digestion(i.e. a combination of two base-specific cleavages) of target nucleicacid alone or in combination with other digestion methods of the presentinvention also represents an informative alternative within the scope ofthe present invention.

Another informative option within the scope of the present inventioninvolves the analysis of truncated target sequences. More specifically,cleavage of chain terminated sequences prepared, for example, byincorporation of a particular 3′-deoxy nucleotide substrate, will yieldspectra that contain additional fragments when compared to the spectrumof the full target nucleic acid and will consequently provide additionalinformation that will, in certain cases, allow a more unambiguousindemnification of sequence variation. This approach will beparticularly useful for the characterization of lengthy digestionproducts or regions containing complex sequence variations.

(c) Alternative Complementary Reactions: Cleavage Characterized by aGreater than Mononucleotide Specificity

In still another of its embodiments, the method of the present inventionalso includes nucleolytic processes that are characterized by adinucleotide- or a relaxed dinucleotide-specificity. Such stringency ofcleavage will facilitate the analysis of longer target sequences becausethe size distribution of the resultant digestion products is even bettersuited for analysis by MS than fragments with an average length of 4nucleotides that are generated by mononucleotide-specific cleavage.Useful in this aspect of the invention are, for example, restrictionendonuclease reagents capable of cutting DNA at dinucleotide sequencessuch as those described by Mead D. et al., WO 94/21663 (PCT/US94/03246).RNases that preferentially hydrolyze pyrimidine-adenosine (CA and UA)bonds have also been identified which are useful in the practice of thepresent invention [E. coli RNase-M, Cannistraro V. and Kennell D., Eur.J. Biochem. 181: 363–370 (1989); as in an endoribonuclease isolated fromSaccharomyces cerevisiae, Stevens A. et al., J. Bacteriol. 164: 57–62(1985); and as is the Enterobacter sp. C-ribonuclease, described byMarotta C. et al., Biochemistry 12: 2901–2904 (1973)]. As disclosed andexemplified in the present invention, the specificity of these enzymescan, if need be, essentially be restricted to CA- or UA-bonds by the useof target nucleic acids that incorporate dUMP (or dTMP) on the one handand dCMP on the other hand.

Stringent or relaxed dinucleotide-specific cleavage may also beengineered through the enzymatic and chemical modification of the targetnucleic acid. By way of non-limiting example, transcripts of the nucleicacid of interest may be synthesized with a mixture of regular andα-thio-substrates and the phosphorothioate internucleoside linkages maysubsequently be modified by alkylation using reagents such as an alkylhalide (e.g. iodoacetamide, -iodoethanol) or 2,3-epoxy-1-propanol. Thephosphotriester bonds formed by such modification are not expected to besubstrates for RNases. Using this procedure, a mono-specific RNase, suchas RNase-T1, can be made to cleave any three, two or one out the fourpossible GpN bonds depending on which substrates are used in the α-thioform for target preparation. The repertoire of dinucleotide-specificreagents useful in the practice of the present invention may be furtherexpanded by using additional RNases, such as RNase-U2 and RNase-A. Inthe case of RNase-A, the specificity may be restricted to CpN or UpNdinucleotides through the enzymatic incorporation of the 2′-modifiedform of the appropriate substrates as described above. For example, tomake RNase-A specific for CpG dinucleotides, a transcript (target) isprepared using the following substrates: αS-dUTP, αS-CTP, αS-ATP, andGTP. Thus, using the indicated methods described herein, it is possibleto engineer all 16 dinucleotide specificities. However, not alldinucleotide-specific reagents described herein would be required if thecomplementary strand of the target nucleic acid is included in theanalysis.

The strategy outlined above makes it possible to prevent cleavage withinhomopolymer tracts (stretches of A's, G's, C's or T's) by an RNase thatis made specific (or is made specific as described above) for therepeated nucleotide. Indeed, incorporation of a particular αS-NTP,followed by alkylation, will selectively prevent cleavage withinrepeated stretches of that nucleotide, allowing cleavage to occur at the3′-side of the last nucleotide in the repeat. Simulation studies,similar to those described in Example 1, have identified this as aparticularly useful strategy. Sequence analysis by digestion of the twocomplementary strands with RNase-T1 and RNase-U2 yielded a 5- to 10-foldreduction in the number of ambiguous mutations when αS-GMP and αS-AMPwere incorporated in the respective transcripts. These studies alsosuggest that the selective blockage of cleavage within repeats isaccompanied by a relatively small increase in the average length of thedigestion products, thereby resulting in considerably less loss ofinformation.

Those of skill in the art will also readily recognize variations oralternatives in certain aspects of the fragmentation methods describedherein. Such alternatives or variations encompassed by the presentinvention include but are not limited to:

-   -   1. the use of other or additional RNases (alone or in        combination) having similar or alternative specificities;    -   2. the use of mutant or chemically modified RNases with useful        characteristics vis-a-vis the methods of the present invention        [see for example, Loverix S. et al., Nature Struct. Biol. 5:        365–368 (1998) for an RNase T1 mutant that prefers the        phosphorothioate analog over the natural phosphodiester        substrate; see also Contreras R. and Fiers W., FEBS Lett. 16:        281–283 (1971) for the production of limited digests with a        chemically modified RNase];    -   3. the use of other nucleotide analogs that exhibit different        masses and/or reactivities, including nucleotides that        incorporate alternative isotopes; and    -   4. alternative specific fragmentation methods, either chemical        [Maxam A. and Gilbert W., Proc. Natl. Acad. Sci. USA 74: 560–564        (1977); Richterich P. et al., Nucleic Acids Res. 23: 4922–4923        (1995)], or enzymatic.

Multiplex Reactions

In another embodiment, the methods of the present invention are directedto the simultaneous sequence determination of at least twonon-contiguous regions in a sample nucleic acid. In contrast totraditional sequencing methods that generate a fragment-ladder (i.e. anested set of fragments that share a common endpoint), the strategiesoutlined herein are equally useful for multiplex sequencing. Multiplexsequencing, according to the present invention, generally involves theco-amplification of selected regions of target nucleic acids. This canbe achieved by using sets of dedicated primer pairs which flank or areco-terminal with a target nucleic acid to be amplified. Alternatively,the preparation of the multiple target nucleic acids comprises theconcomitant amplification of restriction fragments derived from thesample nucleic acid. Some approaches are illustrated and exemplified inExample 5. A special case of multiplex sequencing consists of thesimultaneous analysis of the two complementary strands of a doublestranded target nucleic acid.

In yet another embodiment, the methods of the present invention can beused for the simultaneous sequence determination of the correspondingtarget region(s) of at least two biological samples. A sequencevariation in one out of a pool of analogous target nucleic acids may gounnoticed when analyzing conventional sequence ladders by means of gelelectrophoresis. With the present methods, a sequence variation will, asa rule, yield one or more distinct peaks in the various complementarymass spectra. This feature should allow the detection of mutations at asignificantly lower ratio of mutant to wild-type allele and thereforepermit the analysis of larger pools. The ability to pool renders thepresent methods useful for the discovery of sequence variations acrossparticular target regions in a given population. For this application,typically 5–10 samples may be combined. In case the mutations havepreviously been identified, considerably more samples, e.g. severaltens, can be combined. The characteristics that render the presentmethod useful for the analysis of sample pools make the method alsoeffective for the analysis of heterozygous samples (i.e., an equimolarmix of two alleles).

Mass Spectrometric Methods

Mass-spectrometric methods useful in the practice of the presentinvention include ionization techniques such as matrix assisted laserdesorption ionization (MALDI) and electrospray (ES). These ion sourcescan be matched with various separation/detection formats such astime-of-flight (TOF; using linear or reflectron configurations), singleor multiple quadrupole, Fourier transform ion cyclotron resonance(FTICR), ion trap, or combinations of these as is known in the art ofmass spectrometry. [Limbach P., Mass Spectrom. Rev. 15: 297–336 (1996);Murray K., J Mass Spectrom., 31: 1203–1215 (1996)].

Because the present methods generally require the analysis of complexoligonucleotide fragment mixtures, the MALDI approach, mostly resultingin singly charged molecules, is preferred over ES where significantmultiple charging will further increase the number of spectral peaks.For the desorption/ionization process, numerous matrix/lasercombinations can be employed.

Sequence Determination of Simple Versus Complex Variations

In another embodiment, the methods of the present invention are directedto the diagnostic sequencing of one or more target nucleic acids that,in comparison with a related reference nucleic acid, incorporates asequence variation other than a single nucleotide substitution. Such asequence variation can involve the deletion or insertion of one or morenucleotides as well as the substitution of multiple nucleotides.

Similar to single nucleotide substitutions, the insertion or deletion ofa single nucleotide represents a simple sequence variation whoseanalysis using methods of the present invention is straightforward. Bothof these types of sequence variations are associated with acharacteristic set of (maximum nine) changes in the four complementarymononucleotide-specific fragmentation patterns. It will be understoodthat the methods of the present invention, similar to other sequencingmethods, may not unambiguously locate the point of insertion or deletionwhen it concerns one nucleotide in a stretch of identical nucleotides.This, however, may be taken into consideration when performing acomputer assisted analysis of whether the observed spectra relate in aunique way to a specific sequence variant in accordance with thepractice of the present invention.

Analysis of a microsatellite DNA [also referred to as VNTR (variablenumber tandem repeat) or SSR (simple sequence repeat)] represents aspecial case whose analysis is readily achieved using the methods of thepresent invention. Although multiple nucleotides are involved with VNTRsor SSRs, the interpretation of the spectral changes on the basis of theknown reference sequence is rather simple and the polymorphism (analtered number of repeat units) may readily be characterized.

The methods of the present invention may also be used to analyze morecomplex sequence variations such as those where multiple nucleotides areaffected either through insertion, deletion, substitution or acombination thereof. The analysis of a number of double and triplemutants is described below in Example 3d. Multiple substitutions withina target sequence are also expected to be accompanied by acharacteristic number of spectral changes. This number depends onwhether the substitutions are adjoining or separated, as well as on theintervening sequence in case the mutations are separated. Singlenucleotide substitutions, isolated by a sequence that contains at leastone A, G, C, and T, are each associated with 10 spectral differences asoutlined above. In general, the analysis of complex sequence variantswill require (elaborate) computational approaches. One possiblealgorithm involves the comparison of the experimentally observed spectrawith those generated on the basis of all possible sequences in the shortregion to which the sequence variation is confined. Such an algorithmwill identify the sequence variant or, in case of ambiguities, thedifferent matching sequences. This procedure illustrates that thepresent methods may be applied to the de novo sequencing of shortregions of a target sequence. It will be recognized that, in practice,the experimental observations will not only set the boundaries but willalso define the length of the variant region such that the algorithmneed not consider insertions or deletions. Additional experimentallyderived information, such as the absence of a particular nucleotide, canfurther limit the sequence space the algorithm has to explore. Inparticular applications, the complex sequence variants may be previouslyknown and may thus be part of the set of reference sequences. In suchcases, the experimentally observed spectra may be directly correlated tothose predicted for the reference sequences. There would however stillbe a need to compute whether such correlation is unique. The advantageof previous knowledge is that the experimental approach can be adaptedsuch that the output information indeed relates uniquely to thepotentially occurring complex sequence variations.

Computer Algorithm

The present invention, in part, rests on the insight that computationalanalysis of the spectra obtained in a set of complementary cleavagereactions, and comparison of these data with the computationallypredicted spectral changes from the known reference sequence, asillustrated herein, is an important step in the unambiguousdetermination of the presence, the nature and the location of sequencevariations. More specifically, the computational approaches to simulatethe experiment illustrated herein are necessary to determine whether aunique relation exists between the spectra obtained and a particularsequence variation. Accordingly, one aspect of the present inventioncontemplates a method which utilizes a computer algorithm or methodcapable of computing the spectral differences resulting from one or morenucleotide differences between the target nucleic acid and the referencenucleic acid, the method and algorithm comprising subjecting thereference nucleic acid and sequence variants thereof (i.e., targetnucleic acid having nucleotide differences) to the different basespecific cleavages to generate oligonucleotide fragments, computing themass of each oligonucleotide fragment, generating the mass spectra ofthe oligonucleotide fragments from the reference nucleic acid and thesequence variants thereof for each of the base specific cleavagereactions, and matching these computationally derived mass spectra withthe spectra obtained experimentally in the different base specificcleavage reactions.

In one preferred embodiment the computer algorithm is designed tosystematically compute the spectra of all possible simple nucleotidevariations of the reference nucleic acid, including but not limited toall possible single nucleotide substitutions, deletions and insertions.Since most of the genetic diversity found in living organisms involvessingle nucleotide variations, most of the experimentally observedsequence variations can be identified with the methods and algorithms ofthe present invention, meaning that one or more matches may be foundbetween the observed spectra and the computationally derived massspectra. In case a unique match is found, the sequence variation in thetarget nucleic acid is unique. When more than one match is found betweenspectra, the sequence variation cannot be established unambiguously.

It will be obvious to the person skilled in the art that differentapproaches may be used for performing the computational analysis, suchas, but not limited to, performing the computational analysis on thecomplete reference sequence, or performing a serial computationalanalysis on segments of the reference sequence using, for example, asliding window. The latter approach will enable the identification ofdifferent sequence variants occurring in different parts of thereference sequence.

In another embodiment, the methods and computer algorithms of thepresent invention are designed to explore all possible nucleotidesequences in a limited segment of the reference sequence. Such methodsand algorithms may be used when the preceding approach fails to give amatch, demonstrating that the sequence variation does not correspond toa simple nucleotide variation in the reference nucleic acid. This may bethe case when more than one nucleotide change occurs within a shortregion, such that one or more cleavage products contain multiplenucleotide alterations. The region corresponding to these cleavageproducts can then be explored further by computing the spectra for allpossible sequence permutations and determining the matching sequence. Itis anticipated that given sufficient computing power, such methods andalgorithms may be used for de novo sequencing using mass spectral datagenerated according to the present invention.

Applications of the Present Methods

The methods of the present invention are particularly well suited forrapidly and accurately re-sequencing nucleic acids from a variety ofbiological sources including, but not limited to, plants, animals,fungi, bacteria and viruses. Re-sequencing implies the detection andmapping of both previously known as well as unknown sequence variations(e.g. mutations and polymorphisms) relative to a related referencesequence. One of the most notable distinctions with respect toconventional gel-electrophoretic analysis of fragment ladders, is thatgenerally each particular sequence (variation) results in a distinct andcharacteristic set of (mass) peaks. This feature makes the presentmethods effective for the reliable scoring of heterozygous samples, thesimultaneous sequencing of multiple target regions from a singlebiological sample (i.e., multiplexing), as well as the simultaneousanalysis of the analogous regions from different samples (i.e.,pooling). The use of pools of individual samples should permit thecost-effective identification of previously unknown sequence variationsin a population. This aspect of the invention properties makes thepresent methods valuable for clinical and public health studies. Veryoften such studies rely on samples (e.g., saliva, blood, swabs,paraffin-embedded tissue, biopsy material) that are cellularly andgenetically heterogeneous and, consequently, require assays that candetect mutations at a low ratio of mutant over wild-type allele.

An additional advantage of the present methodology is that it can betuned (by reducing the number of complementary cleavage reactions) suchthat the diagnostic sequencing is limited to particular positions in atarget nucleic acid, a feature useful for the unambiguous scoring ofpreviously identified mutations or polymorphisms. The processesdescribed herein can be used, for example, to diagnose any of the morethan 3000 genetic diseases currently known (e.g., hemophilias,thalassemias, Duchenne Muscular Dystrophy, Huntington's Disease,Alzheimer's Disease and Cystic Fibrosis) or genetic defects yet to beidentified. In addition, certain DNA sequences may predispose anindividual to any of a number of diseases or conditions such asdiabetes, artherosclerosis, obesity, various autoimmune diseases andcancer (e.g., colorectal, breast, ovarian, lung). Depending on thebiological sample, the diagnosis for a genetic disease or geneticpredisposition can be performed either pre- or post-natally using themethods of the present invention. Re-sequencing of nucleic acids derivedfrom infectious organisms using the methods of the present invention mayreveal the basis of pathogenicity and may also be useful to identify thevariation(s) that cause drug-resistance. For example, mutations in theprotease/reverse transcriptase region of the human immunodeficiencyvirus (HIV) have been implicated in the decreased sensitivity towardsthe antiviral activity of protease and reverse transcriptase (RT)inhibitors. The re-sequencing of the nucleic acid encoding these viraldomains is therefore of special interest to monitor disease progression(see Example 1). Similarly, sequencing, according to the presentinvention, may be useful to determine the antibiotic-resistancephenotype of certain bacteria [e.g. Mycobacterium tuberculosis; Head S.et al., Mol. Cell. Probes 13: 81–87 (1999); Troesch A. et al, J. Clin.Microbiol. 37: 49–55 (1999)].

In other embodiments, the present methods are directed to theidentification and classification of target nucleic acids. Analysesaccording to the present invention characterize nucleic acids at a levelessentially equal to sequence determination. Therefore, interrogatedunknown sequences may be unambiguously identified by comparison of theobtained mass spectra with those known or predicted for a plurality ofreference sequences. In this exercise, novel sequences that have nomatching reference database sequence may also be found. The use of themethods for expression profiling (i.e., the analysis of cDNA libraries)as well as whole-genome sequencing is exemplified in Example 6 and 7,respectively. Other applications include the determination of identityor heredity (e.g., paternity or maternity).

Kits for Practicing the Invention

Kits for diagnostic sequencing of one or more target nucleic acids in asample are also provided. In preferred embodiments, such kits compriseone or more reference nucleic acids, various reagents for sequencespecific cleavage protocols, and computer algorithm(s). Such kits mayoptionally also contain nucleic acid amplification reagents.Additionally, the kits may contain reagents for the preparation ofmodified nucleic acids, including but not limited to modified nucleotidesubstrates. The kits may also contain buffers providing conditionssuitable for certain enzymatic or chemical reactions. In addition, thekits may contain reagents, such as solid supports, for purposes ofisolating certain nucleic acids and preparing nucleic fragments for massspectrometric analysis.

The foregoing aspects of the invention are illustrative and should notbe construed to limit the invention as set out in the appended claims.Variations in some aspects as well as alternative procedures will bereadily recognized by one of ordinary skill in the art.

Example 1 describes modeling the diagnostic sequence analysis of a 1200base-pair region of HIV-1 using methods of the present invention.

Example 2 describes methods for base-specific cleavage by modifying thenucleic acid template to be cleaved.

Example 3 illustrates the diagnostic sequencing of the RNase-T 1 codingregion according to the methods of the present invention.

Example 4 illustrates the analysis of a 1000 base-pair nucleic acid.

Example 5 illustrates the use of the present invention for genotyping,including multiplex genotyping.

Example 6 illustrates the use of the present invention for transcriptionprofiling.

Example 7 illustrates the use of the present invention for whole genomeresequencing.

EXAMPLE 1 Modeling the Diagnostic Sequence Analysis of a 1200 Base-pairRegion of HIV-1

The methods of the present invention have been utilized on a 1200base-pair sequence derived from human immunodeficiency virus type 1(HIV-1; HXB2 isolate; Genbank accession number K03455; position 2161 to3360). This sequence was used as a model in computer simulations toexamine the overall performance of the method, as well as the occurrenceof ambiguities. The selected region encompasses the entire protease geneand the first ˜270 codons of reverse transcriptase [compare with HertogsK. et al., Antimicrob. Agents Chemother. 42: 269–276 (1998)]. Thegenotyping/re-sequencing of this domain of clinical isolates of HIV isof special interest in order to monitor the emergence of drugresistance-associated mutations. Single as well as multiple changes havebeen implicated in the decreased sensitivity towards the antiviralactivity of protease and RT inhibitors [Hertogs K. et al., Antimicrob.Agents Chemother. 42: 269–276 (1998); Schinazi R. et al., Int. Antivir.News 4: 95–107 (1996) and references cited therein].

The principal objective of the computer simulation was to examine theperformance of the re-sequencing method for detecting and mapping SNPs.To this end we have performed computational simulation analyses in whichwe have systematically mutated each nucleotide one by one in the 1200base-pair sequence. For each mutation we have calculated the molecularmasses of the cleavage products that would be generated from a givensegment of the sequence in the different four RNase digestion reactions,namely upon RNase-T1 and RNase-U2 cleavage of the (+) and (−) strands.The comparison of these masses with those of the reference cleavageproducts from the original sequence identifies the masses of thediagnostic fragments associated with each mutational change, i.e.,fragments that either appear or disappear as a result of the mutation.The underlying assumption in this analysis was that in order to bemeasurable, the fragment must have a molecular mass different from thoseof the other cleavage products generated in the same reaction.Furthermore, we have assumed that the resolution of the mass specanalysis is limited to mass differences larger than either 5 Da or 0.1%.In other words fragments whose mass difference with other fragments inthe same digest is smaller than 5 Da or 0.1% were not scored in theanalysis. The quantitative aspects of a mass spectrum (i.e. peakheights) were not considered in the present simulation study. For eachmutational change we have computed the number of fragments that arediagnostic for the presence of the mutation. Mutational changes werescored as detectable when there was at least one diagnostic fragment(showing a spectral change). In addition, we have examined whether themutational changes can also be mapped unambiguously. To this end we havecompared the sets of diagnostic fragments associated with each mutation.Mutations that yield unique sets of fragments can be mappedunambiguously, while mutations that give the same sets cannot bedistinguished from one another.

In a first simulation analysis we have computed the fraction of SNPsthat may be detected and mapped using respectively 1, 2 and 4 RNasedigestion reactions. To this end we have performed a systematic singlenucleotide substitution simulation on a 200-base-pair segment of the HIVsequence. For each of the four different RNase digestion reactions[RNase-T1 and RNase-U2 cleavage of the (+) and (−) strands] we havecalculated the number of detectable diagnostic fragments and haveanalyzed whether these fragments are unique for each mutation. Theresults summarized in FIG. 2 show that in each of the single RNasedigest reactions a large fraction (55% to 85%) of the mutations aredetected. In contrast, only a small fraction (20% to 30%) of thesemutational variations can be mapped unambiguously. The principal reasonis that many different mutational changes result in the same massdifferences. The fraction of mutations that can be mapped increases toaround 60% to 70% when the data of two RNase digest reactions arecombined. The further combination of the data from the four differentcleavage reactions allows 96% of the mutational changes to be positionedunambiguously and illustrates the advantages of the methods of thepresent invention. Close inspection of the sequence ambiguities revealsthat about half of these involve C to U (or conversely A to G)transitions. Because the difference in molecular mass between C and Uresidues is only 1 Da, the mass difference in the cleavage products ofthe strand carrying the pyrimidine base is too small to be detectable.Consequently one might expect that these mutational changes may becomedetectable when using m⁵U instead of U. Computational simulations usingm⁵U on the same 200 base-pair sequence shows that the fraction ofmutations that can be mapped unambiguously increases to 98%.Consequently all further simulations are based on the use of the analogm⁵U. These results demonstrate that the four mononucleotide-specificRNase digests are both necessary and sufficient for re-sequencing ofmost sequences with a high degree of accuracy.

It will be obvious that the quality of the sequences obtained with themethods of the invention will be strongly influenced by the size of thesequence segments that are examined. Indeed, the larger the size of thesegment, the larger the statistical chance that certain relevantdiagnostic fragments may coincide with other cleavage products generatedin the same reaction. We have therefore performed a systematic singlenucleotide substitution simulation analysis on the 1,200 base-pair HIVsequence using different size segments, namely 100, 200, 300 and 600base-pairs. In each simulation a total of 3,600 single mutationalsubstitutions was analyzed. For each of the four different RNase digestreactions both the number and the patterns of the measurable diagnosticfragments were computed using the detection limits described above. FIG.3 shows the distribution of the number of diagnostic fragments obtainedwith the 3,600 mutational changes in the four different analyses. Theresults clearly indicate that a larger percentage of the singlenucleotide substitutions is associated with fewer diagnostic spectralchanges when using larger segments of DNA.

In each simulation we determined both the number of detectable SNPs aswell as the fraction of SNPs that can be mapped unambiguously. Theresults of the computational simulations summarized in FIG. 4 show thatalmost all the mutational changes are detected in the four differentanalyses. Of the 3,600 SNPs, the number that escaped detection wererespectively 0, 1, 3 and 9 using 100 base-pair, 200 base-pair, 300base-pair and 600 base-pair segments, respectively. In contrast, thefraction of mutational variations that can be mapped unambiguouslydecreases much more when using longer segments. While only 1% of theSNPs are ambiguous when analyzing 100 base-pair segments, that fractionincreases to almost 10% with 600 base-pair segments. Close inspection ofthe ambiguities shows that the majority of these involve nearby (oftenadjacent) pairs of identical bases where the analysis can determine thenature of the mutation but fails to identify which of the bases ischanged.

In conclusion, the results of the simulations show that the methods ofthe invention are effective for re-sequencing and that even largesegments may be used when only a limited number of positions need to beanalyzed. Also, it appears that in most cases a computer-aidedsimulation study will be essential in the experimental design as well asthe data interpretation when using the methods of the present invention.Most importantly, the simulations will indicate whether spectral changesare unambiguously linked to particular sequence variations.

EXAMPLE 2 Base-Specific Cleavage by Modification of the Template

The present example illustrates that the specificity of cleavage by anucleolytic reagent may be further confined through the modification ofthe target template such that particular phosphodiester bonds resistcleavage. More particularly, it is demonstrated that RNase-A, whichnormally cleaves at the 3′-side of both C- and U-residues, becomesmononucleotide-specific when the target incorporates the 2′-deoxy analogof one of these nucleotides. A region of the plasmid vector pGEM3-Zf(+)(Promega, Madison, Wis.), encompassing the multi-cloning site as well asthe phage T7 promoter sequences, was used as a model (see FIG. 5).

The first step towards the sequence analysis according to the presentinvention involved the amplification of the 158 base-pair test sequence.The reaction was carried out in a total volume of 50 μl using 12.5 pmoleach of the forward and reverse primer, 200 μM of each dNTP, 0.25 μl TaqDNA polymerase (5U/μl; Promega, Madison, Wis.), 1.5 mM MgCl₂ and abuffer supplied with the enzyme. After an initial incubation at 94° C.for 2 min, 40 cycles of the following temperature program were performed94° C. for 30 sec, 50° C. for 30 sec, and 72° C. for sec. The sample waskept an additional 15 min at 72° C. and then chilled. The PCR reactionproduct was purified (High Pure PCR Product Purification Kit; RocheDiagnostics Belgium, Brussels, Belgium) and subsequently used fortranscription of one specific strand. A mutant T7 RNA polymerase (T7R&DNA™ polymerase; Epicentre, Madison, Wis.) with the ability toincorporate both dNTPs and rNTPs was used in the transcriptionreactions. In addition to a transcription with the regularribonucleotide substrates, one reaction was performed where CTP wasreplaced by dCTP, while in two more separate transcriptions either dUTPor dTTP replaced UTP. The transcription reactions were run in a 50 μlvolume containing: 40 mM Tris-Ac (pH 8.0), 40 mM KAc, 8 mM spermidine, 5mM dithiothreitol, 15 mM MgCl₂, 1 mM of each rNTP, 5 mM of dNTP (inthese cases the appropriate NTP was excluded), 40 nM DNA template (2pmol), and 250 units T7 R&DNA™ polymerase. Incubation was performed at37° C. for 2 hours. After transcription, the full-length T7 in vitrotranscripts (118 nucleotides) were purified by allowing them to annealto the 5′-biotinylated form of the complementary reverse PCR primer(FIG. 5) followed by capture of the biotinylated annealing products ontostreptavidin-coated magnetic beads. To this end, 50 pmol biotinylatedreverse primer was added to the transcription reactions. The mixtureswere first incubated 5 min at 70° C. and, subsequently, 30 min at roomtemperature. Then, a slight excess of Sera-Mag™ streptavidin magneticmicroparticles [Seradyn Inc, Indianapolis, Ind.; resuspended in 50 μl of2M NaCl, 20 mM Tris-HCl (pH 8.0), 2 mM EDTA] was added and the resultantmixture incubated at room temperature for 30 min with agitation. Amagnetic particle collector (MPC; Dynal, Oslo, Norway) was used tocollect the beads, remove the supernatant and, subsequently, to wash thebeads three times with 100 μl 100 mM (NH₄)₃-citrate. The beads werefinally resuspended in 3 μl 25 mM (NH₄)₃-citrate containing 0.5 μgbovine pancreas RNase-A (50U/mg; Roche Diagnostics Belgium, Brussels,Belgium) and incubated at room temperature for about 30 min to digestthe transcripts to completion. 1 μl of this RNase reaction was removedand added to 5 μl matrix solution. This 1:1 acetonitrile:H₂O matrixsolution is saturated with 3-hydroxypicolinic acid (˜100 mg/ml), andfurther contains 25 mM (NH₄)₃-citrate, (occasionally) 2 pmol/μl of anoligonucleotide serving as an internal standard, and cation-exchangebeads in (NH₄)⁺-form (Dowex 50W-X2; Sigma, Saint-Louis, Mo.) to minimizethe presence of sodium and potassium adducts. After incubating themixture at room temperature for 15 min, 1 μl was put on the sample plateand allowed to dry. Mass spectra were collected using a Reflex III massspectrometer (Bruker Daltonik GmbH, Bremen, Germany).

The RNase-A cleavage products predicted for each of the four transcriptsare shown in Table II. Note that the mass calculation of the predictedfragments assumes a 3′-phosphate group and not the 2′,3′-cyclicphosphate intermediate of the cleavage reaction. Overall, theexperimentally obtained spectra (FIG. 6) are in excellent agreement withthe predictions. The absence of some of the smallest 3-mers (FIGS. 6Aand 6C) may be related to the mass-gate that was applied to eliminatethe non-informative mono- and di-nucleotide digestion products. Thepredicted 3′-proximal fragment TGTTTC (1830, 1 Da) is only poorlyascertained in FIG. 6C, i.e., the spectrum deriving from thedU-transcript. This result, along with other observations, suggests thatfragments with a relatively high dU-content are detected with asignificantly lower sensitivity using the present MS methodology. The2817 Da peak in FIG. 6D corresponds to the double protonated form of theadded oligonucleotide. Some of the expected fragments cannot be resolvedbecause they have an identical composition. Also, the digestion productsof the regular transcript that differ by one Da only (e.g. thedifference between CMP and UMP; Table II) cannot be seen as distinctpeaks in FIG. 6A. In total, the data convincingly demonstrate thatRNase-A behaves as a C-specific RNase when dTTP or dUTP is substitutedfor UTP, and as a U-specific reagent when dC rather than C isincorporated into the substrate transcripts. This high level ofnucleobase specificity is achieved even under the over-digestionconditions used in the present Example.

The protocol described in the present Example is illustrative andcertain modifications and variations will occur to the skilled artisan.The immobilization of the transcripts represents an easy means toprepare the material for MS analysis, e.g., removal of all otherreaction components and exchange of Na⁺ and K⁺ counter-ions for (NH4)⁺(note that the subsequent RNase digestion does not require any reagentsthat are ‘incompatible’ with MS). While other methods, such aschromatography, may be used to prepare the transcripts or the deriveddigestion products for MS analysis, the present method is favorable inthat it is readily amenable to automation and high-throughput analysis.In repeat experiments, yielding essentially the same results asdescribed herein, the transcripts were digested in water and ˜15nanoliter of these digests was directly applied onto a Spectrochip™(Sequenom Inc., San Diego, Calif.) for analysis by MALDI-TOF-MS.

EXAMPLE 3 Diagnostic Sequencing of the RNase-T1 Coding Region

The present example illustrates the application of the methods of theinvention to the re-sequencing of a portion of the RNase-T1 codingregion. We selected the RNase-T 1 coding region because of theavailability of a collection of site-directed mutants [Steyaert J., Eur.J. Biochem. 247: 1–11 (1997)] which had previously been sequenced usingthe classical dideoxy chain termination method. The wild-type and mutantsequences, used in the present example, are shown in FIG. 7.

a. Analysis of the Wild-Type RNase-T1 Sequence

The experiments were performed essentially as described in Example 2.First, the selected wild-type RNase-T1 target sequences were amplifiedby PCR with the following primers:

-   5′-CCGGATATAAACTTCACGAAGACGG (forward) (SEQ ID NO: 16)-   5′-GATAGGCCATTCGTAGTAGGGAGAGC (reverse) (SEQ ID NO: 17)    The resultant amplicon was subsequently re-amplified using either a    forward or a reverse primer that incorporates the T7 promoter site    as a 5′ non-annealing extension (see FIG. 7A):-   5′-TAATACGACTCACTATAGGGCGACTTCACGAAGACGG (forward) (SEQ ID NO: 18)-   5′-TAATACGACTCACTATAGGGCGAATTCGTAGTAGGGAGAGC (reverse) (SEQ ID NO:    19)

Subsequently, each of the resultant promoter-appended amplicons was usedas template in two separate transcription reactions. The T7 R&DNApolymerase (Epicentre, Madison, Wis.) was used to prepare transcriptsthat incorporate dCMP or dUMP instead of respectively CMP and UMP(referred to as the dC- and dU-transcripts). The transcription reactionswere carried out as described in Example 2, except that each rNTP waspresent at 2 mM and incubation was performed overnight at 37° C. Thefour full-length T7-transcripts were purified by annealing with abiotinylated oligonucleotide that matches with the transcript 3′-end(i.e. the biotinylated form of either the forward or the reverse PCRprimer used in the first amplification step) and subsequent capture ontostreptavidin microparticles. After extensive washing with(NH₄)₃-citrate, the transcripts were eluted. The beads were resuspendedin 3 μl of water and kept at 90° C. for 2 min, immediately followed bycollection of the beads with the magnet and transfer of the supernatantto a fresh tube. Then, the obtained amplified target nucleic acids weredigested to completion by the addition of 1 μl of 100 mM (NH₄)₃-citratecontaining RNase-A. Finally, the reaction products were analyzed byMALDI-TOF-MS.

A graphical representation of the spectra is shown in FIGS. 8A–D. Thepredicted degradation products are listed in Table III. As with thepGEM3-Zf(+) transcripts the obtained spectra are in good agreement withthe predictions. A few peaks that are most likely the result of doubleprotonation were also observed (see FIG. 8B). The T-reaction on the (−)strand suggests the occurrence of transcripts with an extra non-templateencoded nucleotide at the 3′-end [Milligan J. et al., Nucleic Acids Res.15: 8783–8798 (1987)]. Indeed, in addition to the expected 3′-terminalfragment, a prominent peak is observed that coincides with the samefragment containing an extra G-residue (FIG. 8D and Table III). Theabsence of the expected 3′-terminal fragment from the C-reaction on the(+) strand (1153 Da; FIG. 8A) may be explained by this same phenomenon.In this case, cleavage of the 3′-extended transcript would occur andresult in the 3′-phosphorylated (rather than the 3′-OH) form of thepredicted fragment, a product which would coincide with another fragmentof the same digestion (1233.7 Da; Table III).

-   -   b. Analysis of Selected RNase-T1 Single Point Mutations

Four single nucleotide substitutions were chosen (mutant #1, #2, #3, and#4 in FIG. 7B). Each of the mutant sequences was analyzed as describedfor the wild-type RNase-T1 coding region (Example 3a). The results aresummarized in Table IV. Table IV shows, for each mutation, which 5fragments of the wild-type RNase-T 1 reference sequence are affected bythe mutation as well as the 5 fragments that are mutation-specific. Italso shows which changes are missing, and consequently on how many, outof the ten theoretical data points, the mutation identification isactually based. Spectral changes are missing because they involvefragments that are too small (<3-mer) or not unique. Also, a fewfragments were not experimentally observed, e.g., one 3-mer as well asthe largest fragments with a mass of ≧9,8 Kda. Of particular interestare the results concerning mutation #2. These results indeed bestillustrate the present invention. In this particular case, all fourmono-nucleotide specific cleavage reactions result in the detection of amutation, i.e. one will notice that the sequence differs from thewild-type RNase-T1 coding region. However none of these reactions, whentaken alone, leads to the unambiguous mapping of the mutation. TheC-reaction on the (+) strand results in a new fragment of 1947 Da. Notonly the single nucleotide mutation #2 can explain the creation of sucha 6-mer [composition=A₃G(dU)C]. For example, this is also the case for adouble mutation that converts the sequence CTACTAC into CAAGTAC (seeFIG. 7); the TAC peak will not be lost because of the presence of athird such 3-mer. The T-reaction on the (+) strand results in a spectrumwhere the mass of one fragment has increased by 56 Da when compared tothe reference spectrum. This suggests the replacement of a dC by a G.Because the cleavage product contains three dC residues, it is notpossible to position the substitution. The C-reaction on the (−) strandis at first sight the most informative; a large reference fragment isaffected by the cleavage. The sequence of the fragment(GTAG₁TT—TG₂GATC)(SEQ ID NO: 20) is however such that both the G₁->C andthe G₂->C mutation can explain the observed products of 9814 Da and 1289Da [composition=GA(dU)C]. Finally, the T-reaction on the (−) strand isthe least informative and the appearance of a peak of 944 Da [A(dC)U]can be explained in many different ways. An A(dC)U-fragment is, forexample, generated by substitution of the T₁-residue for a C in thesequence stretch TAT₁TT (see FIG. 7). In conclusion, mutation #2exemplifies that in some cases the nature and position of a sequencevariation may only be determined by a combination of at least twodifferent complementary cleavage reactions.

c. Analysis of a Mixture of Wild-Type and Mutant RNase-T1 Sequences

The analyses shown in Table IV can be used to simulate experiments whereequimolar mixtures of the wild-type RNase-T1 sequence and one of thesingle nucleotide substitutions are examined. In such cases, which mimicheterozygotic genotypes, the spectra contain a number of novel fragmentsin addition to all those derived from the (wild-type) referencesequence. The characterization and location of the mutation/polymorphismis therefore necessarily based on the novel fragments only. Unambiguityrequires that the novel fragments are sufficient to uniquely define themutation. Those of skill in the art will realize that zygositydetermination is straightforward using the present methods because eachallele is associated with a distinct set of peaks.

We performed a number of experiments where on particular singlenucleotide mutant (e.g., mutant #3; FIG. 7B) was mixed with wild-typeRNase-T1 such that the mutant allele was present at the followingfractions: 1:2. 1:5, 1:10, 1:20, 1:50, 1:100, 1:200, 1:500 and 1:1000.the experiment mimics the analyses of pools of samples characterized bydifferent allele frequencies. First, equivalent quantities of thewild-type and mutant target sequences were synthesized by PCRamplification using conditions where the primers are limiting andcompletely consumed. After mixing the two amplicons in the desiredratios, the material was re-amplified. Then, transcripts of the (−)strand were prepared and digested as described above, except thattranscriptions were performed using all four nucleotide triphosphatesubstrates in the ribo-form (rNTPs) and that cleavage was carried outwith RNase-T1 instead of RNase-A. Each of the digestion reactions wasmeasured 5 times. Cleavage with the RNase-T1 enzyme generates apolymorphic 15-mer fragment which reads: AAAUCAAAACCUUCG(SEQ ID NO: 21),where the underlined residue is changed to A by mutation #3 (refer toFIGS. 7A and 7B). The mass of the wild-type and the mutant fragment is4807,91 Da and 4830,95 Da, respectively; the mutation causes a shift of23 Da. We found that there was an excellent linear correlation betweenthe allele frequencies and the relative peak heights (R²=0,97) and thatthe peak associated with the mutant allele could still be identifiedwith confidence when it represented 5–10% of the material. It should benoted that in other experiments the minimum ratio of mutant overwild-type allele that can be detected might be significantly lower.Indeed, in the present example, the reliable detection of the ‘mutantpeak’ was somewhat encumbered by the occurrence of an extra peak asevidenced by the control spectrum recorded for the wild-type targetnucleic acid. This extra peak may possibly be attributed to a low levelof Na⁺-adduct of the wild-type fragment (22 Da mass shift). In all, thelatter data indicate that homologous target nucleic acids can be pooledand analyzed simultaneously; in addition to revealing certain sequencevariations, the methods of the present invention may permit the allelefrequencies to be estimated among the pool of biological samples. Whilediagnostic sequence determination as disclosed herein relies primarilyon the appearances and disappearances of peaks as well as peak shifts,the present example indicates that certain quantitative aspects of aspectrum (e.g., peak height and peak area) can be included in thesequence analysis and yield complementary valuable information.

d. Analysis of RNase-T1 Multiple Mutants

The methods of the present invention are not limited to the analysis ofsingle nucleotide substitutions. Complex variations can also besequenced. Table IV lists the spectral changes that are predicted to beassociated with a number of RNase-T1 multiple mutants, more particularlydouble and triple mutants (mutant #5, #6, #7, and #8 in FIG. 7B). Asdescribed above, multiple mutants are associated with a characteristicnumber of spectral changes. In the case of multiple substitutions, withno deletions or insertions involved, the number of affected referencefragments is always identical to the number of novel fragments. Fordouble mutants the number of spectral changes ranges from 12, in casethe mutations are adjoining (mutant #5), to a maximum of 20, in case themutations are separated by a sequence that contains at least one A, G,C, and T. In the latter case, the double mutant is to be treated as twoconcurrent but independent single nucleotide substitutions. Triplemutants are associated with a minimum of 14 spectral changes (mutant#7). As with single nucleotide substitutions, not all the theoreticalspectral changes can or may be observed and part of the information willbe lost. In the vast majority of the cases however a systematiccomputational analysis, based on the obtained spectra and the referencenucleic acid sequence(s), can unambiguously identify and locate thesequence variations.

EXAMPLE 4 Mass Spectrometric Analysis of a ˜1000 Base-Pair Region

The methods of the invention are designed to overcome the limitation ofthe short read lengths encountered with current MS-based sequencingmethodologies that involve the analysis of fragment-ladders. One canenvision that, depending on the application, target regions of severalhundred or even a few 1000 base-pairs can be analyzed. The presentexample demonstrates that a large number of oligonucleotide fragmentscan be analyzed simultaneously by the methods of the present inventionand that, consequently, the detection platform does not impose a limiton the methodology.

Following the scheme presented in Example 2, a 1012 base-pair region ofthe plasmid vector pGEM3-Zf(+) (Promega, Madison, Wis.) was amplifiedand the resultant amplicon, subsequently, used for preparation of a 972nucleotides long in vitro T7 transcript (see FIG. 5). The transcriptincorporated dCMP instead of CMP such that a U-specific cleavage couldbe performed by RNase-A. The cleavage products predicted for thistranscript, are listed in Table V. FIG. 9 shows the most relevant partsof the experimentally obtained spectrum. The primary conclusion from theexperimental data is that complex mono-nucleotide specific digestionreactions, consisting of >200 cleavage products, can be analyzed by massspectrometry. The vast majority of the about 67 predicted distinct peaksare readily identified. Only a few of the 4-mer fragments are not orbarely detectable. It also appears that in the present experiment theassignment of some peaks requires the assumption that (at least aportion of) certain digestion products contains a 2′,3′-cyclic phosphateinstead of a 3′-phosphate group. Such peaks differ from the parent peaksby −18 Da. It is well known that cyclic phosphates result from thetransesterification cleavage reaction and that these intermediates gethydrolyzed in a slower second reaction step.

EXAMPLE 5 Genotyping

The methods of the present invention are also useful for the diagnosticsequencing of multiple non-contiguous regions of a sample nucleic acid.This renders the present methods useful for the genome-wide discovery aswell as the routine scoring of polymorphisms (e.g. SNPs) and mutationsat multiple loci in genomic DNA. Such multiplex genotyping isconceptually no different than re-sequencing; both require thatalterations are characterized and positioned unequivocally. Similar toexperiments involving a single target sequence described above, acomputer simulation can be performed to find out which ones of theobserved spectral changes is uniquely linked to particular genomicalterations. Since multiplex genotyping only requires theidentification/diagnosing of a number of variant positions, it will berecognized by those of skill in the art that (i) the complexity (i.e.the combined length) of the multiple target sequences may besignificantly greater than in the case of full re-sequencing, and (ii) asingle specific cleavage reaction may often suffice for both allele andzygosity identification. Applications which involve the use of twosequence-specific cleavages that each positively identify one of the twoalternative forms of a series of bi-allelic SNPs are also possible usingthe methods of the present invention. For example, many C to Ttransitions, the most common type of point mutations and polymorphismsin human, may be easily scored by a combination of C- and T(U)-specificreactions. It is worth mentioning that heterozygous samples analyzedusing gel-electrophoretic sequencing are often difficult to identifywith confidence. With the methods described herein, the detection ofheterozygosity is unambiguous because of the presence of both thewild-type and the mutation specific set of mass spectral peaks.

Multiplex genotyping will generally involve the co-amplification ofgenomic regions. In the case of previously known SNP genetic markers,co-amplification of selected loci can be achieved by using dedicatedprimer pairs [Wang et al., Science 250: 1077–1081 (1998)].Alternatively, a more generic approach can be adopted for both thediscovery and the subsequent routine scoring of a set of SNPs where thepreparation of target sequences comprises the concomitant amplificationof multiple short restriction fragments derived from the sample nucleicacid. This ‘random sampling’ method may be particularly useful withorganisms that have a high polymorphism content (e.g., more than 1 SNPin 100 base-pairs). This co-amplification can be achieved by ligating tothe ends of the restriction fragments adaptor sequences that incorporatethe target sites for a single PCR primer pair. In this approach, theaverage size of the amplicons must be small such that the majorityincorporates ≧1 SNP while, additionally, the total number of theamplicons must be sufficiently small so that their combined length isamenable to analysis by the present methods. These requisites can be metby the appropriate choice of restriction enzymes and the use of methodsthat permit the selective amplification of discrete subsets ofrestriction fragments [Vos P. et al., Nucleic Acids Res. 23: 4407–4414(1995); Zabeau M. and Vos P., EP 0534858 (1993); Kikuya Kato, NucleicAcids Res. 23: 3685–3690 (1995)] and as described herein. For example, afirst restriction enzyme that cleaves rarely in the genome under studycan be combined with a second reagent that generates fragments with anaverage size of about 100 base-pairs (e.g., a combination of two enzymeswith tetra-nucleotide recognition sites). The number of fragments edgedby the two different restriction sites should preferably be less than100,000; a suitable subset of these can readily be amplified by the useof selective primers [Vos P. et al., Nucleic Acids Res. 23:4407–4414(1995)]. In addition, a PCR protocol, characterized by a highlyshortened elongation time, can be used such that the amplification ofshort fragments is strongly favored thereby further reducing the numberand the average size of the amplicons. During the selectiveco-amplification of genomic fragments or in a subsequent PCR step, afirst primer can be used that attaches a full promoter sequence (e.g.,one deriving from bacteriophage T7, T3 or SP6; supra) to the amplicons.The second strand may be synthesized by extension of a primer thatcontains a ribonucleotide residue at, for example, the penultimateposition. Following PCR amplification, the primer sequences can beremoved from this second strand by RNase digestion, and the resultanttruncated strand transcribed with the aid of the first primer. Thisprocedure minimizes the common sequences that are connected to thetarget restriction fragments.

EXAMPLE 6 cDNA Library Analyses—Transcription Profiling

Diagnostic sequencing will, generally, be performed on a defined nucleicacid, i.e. one will know to what reference sequence the target nucleicacid corresponds. However, the re-sequencing methods according to thepresent invention can also be used to identify or classify certainsequences. In such experiments, the interrogated nucleic acid (e.g. arandom clone of DNA) will typically correspond to an unknown portion ofa (much) larger sample sequence or represent one out of a plurality ofnucleic acids present in a biological sample, or a combination of both.The mass spectra derived from the unknown nucleic acid are compared tothose known or predicted for the related reference sequence(s), orportions thereof. Note that, in this type of experiments, some of theinterrogated target sequences need not necessarily have theircounterparts in the reference sequences, and vice versa. It will berealized that sequence identification according to the present methodsmay, at the same time, reveal possible sequence variations. Interrogatedsequences may thus be classified as identical to one of the databasesequences, as a variant of such as a reference sequence or as novel incase no matching sequence is found.

It should be recognized that analyses that involve at least the fourcomplementary mono-nucleotide specific cleavage reactions identifyunknown sequences with a resolution essentially equal to sequencedetermination. At the same time, the MS-based methods described hereinallow fast data acquisition and are amenable to high-tbroughput.Therefore, the present methods are useful to identify and cataloguenucleic acids at an unprecedented scale and speed. One applicationconsists of the analysis of cDNA libraries for the purpose of: (i) theassembly of unigene libraries (i.e. the identification/removal ofreplicate clones), (ii) the identification of novel genes or novelvariants of previously identified genes, and (iii) transcriptionprofiling. The speed and throughput of the present method should permitthe processing of more clones and, hence, a more in depth analysis of acDNA library.

A variety of methods are known in the art for transcription profiling,i.e. the analysis of the transcription in both qualitative andquantitative terms. In one method, the expressed-sequence-tag (EST)approach, the mRNA population is assessed by partial sequencing ofrandomly selected cDNA clones. Global changes in gene-expressionpatterns are deduced from the EST ratios among two compared cDNAlibraries [Lee N. et al., Proc. Natl. Acad. Sci. USA 92: 8303–8307(1995)]. The methods described herein may be used to catalogue expressedgenes with a similar level of resolution but considerably higher speedand throughput. First, a library of unidirectionally cloned cDNAs isconstructed in a vector that permits transcription of the insertedsequences. Preferably, the 3′-end of the cDNAs is located adjacent tothe promoter. Template for transcription can be prepared byamplification of the promoter-cDNA cassette using a pair ofvector-specific primers. Alternatively, vector DNA is prepared andcleaved at a restriction site within the vector close to the 5′-end ofthe inserted cDNA (e.g. ˜25 base-pairs). Preferably, the restrictionsite at which the templates are cleaved should have a low occurrencefrequency within the cDNAs under study. Run-off transcripts, synthesizedfrom PCR product or digested vector, are characterized by a common3′-end, consisting of vector sequences, which allows the isolation offull-length transcripts as described in Example 2. An alternativestrategy involves treatment of the vector DNAs with a restrictionreagent such that not only all templates are digested at the cDNA 5′-endbut that a vast majority is also cleaved within the cDNA at somedistance from the 3′-end (e.g. a few hundred base-pairs). Therestriction reagent may be a single enzyme or a combination of two ormore restriction enzymes. Ligation of an adaptor to the digestionproduct(s) [see Vos P. et al., Nucleic Acids Res. 23: 4407–4414 (1995)]can be considered so as to obtain full-length transcripts with a common3′-end enabling their isolation as described in Example 2. However,transcripts that incorporate a biotin group at the 5′-end may also beprepared [Hahner S. et al., Nucleic Acids Res., 25: 1957–1964 (1997)],providing an alternative means for their immobilization. Digestionwithin the cDNAs is an attractive option in that different partial cDNAsderiving from the same transcript are made congruent by this procedureand thereby facile to identify. The full-length run-off transcripts arefinally subjected to complementary sequence-specific cleavage reactions,and the resultant digestion products analyzed by MS as disclosed herein.

Those of skill in the art will recognize the advantages of thetranscript profiling method outlined above. Comparable to the ESTapproach, cDNAs are identified at the sequence-level, i.e. the ultimatelevel of resolution. Thus, while the method involves fragmentation ofthe interrogated nucleic acid, its level of resolution far exceeds thatattained by fingerprinting techniques [Prashar Y. and Weissman S., Proc.Natl. Acad. Sci. USA 93: 659–663 (1996); Bachem C. et al., The PlantJournal 9: 745–753 (1996); Ivanova N. and Belyavsky A., Nucleic AcidsRes. 23: 2954–2958 (1995); Liang P. and Pardee A., Science 257: 967–971(1992)]. In contrast to hybridization-based approaches [Schena M. etal., Science 270: 467–470 (1995); Wodicka L. et al., NatureBiotechnology 15: 1359–1367 (1997)] the method can identify both knownand previously unknown sequences. Also, it should prove faster thenmethods requiring gel-electrophoretic fractionation.

EXAMPLE 7 Whole-Genome Re-Sequencing

In the past couple of years the technology for sequencing entiregenomes, especially those of microorganisms, has come to maturity. Morethan 50 microbial genomes are scheduled to be completed by the year2000, and the benefits emerging from this vast body of knowledge arerapidly becoming clear [Clayton R. et al., Curr. Opinion Microbiol. 1:562–566 (1998)]. It seems clear that sequencing entire microbial genomesis becoming routine and that microbial genetics is entering the era of‘comparative genomics’. Knowledge of the complete genome sequence is theultimate tool in phylogenetic analyses, allows gene/functional diversitystudies, and fundamentally changes the manner in which research isconducted in an organism. At the present time, a substantial portion ofeach new genome sequence has no database match. One may expect to see agreater proportion of orthologous genes in the future, when themicrobial species diversity is better represented. At that point, whenmost of the sequences generated will be similar to already knownsequences, global genome analyses could be performed rapidly,accurately, and cost-effectively using a re-sequencing strategy asdescribed herein rather than by de novo sequencing methods. Similarevolutions may be anticipated outside the bacterial genetics field wheregenome projects for many (model) organisms are ongoing or have alreadybeen finished (e.g., Drosophila melanogaster, Caenorhabditis elegans,human, mouse, Arabidopsis thaliana, and rice).

The methods of the present invention may be readily adapted to there-sequencing of entire (bacterial) genomes or megabase nucleic acidregions. This may be accomplished with the use of a shotgun approachthat involves the sequence analysis of unselected subclones that harborrandom fragments according to the methods of the present invention. Theassembly of all the independent, random sequences is fundamentallydifferent from that in a de novo sequencing project [Fleischmann R etal., Science 269: 496–512 (1995)] because of the availability of areference sequence that serves as a scaffold. The assembly into a singlecomplete sequence comes down to matching each set of experimentallyobtained spectra with a portion of the reference sequence. Thecomputational approaches required to accomplish this are similar tothose that are needed for the analysis of cDNA libraries, outlined inExample 6. In both cases one does not know in advance the referencesequence, if at all existing, for a given interrogated target region. Itshould be noted, however, that the present shotgun approach might beeven more demanding in terms of computational power because of theundefined ends of the segments. At the same time, the algorithms must becapable of mapping the variations that occur between the target and thereference sequence. It is expected that a shotgun approach with itsbuilt-in redundancy (i.e., most sequences will be covered several-fold)should prove useful for the comprehensive comparison of a pair ofrelated genomes. An alternative for the shotgun approach strategyconsists of the analysis of clones from one or more libraries ofrestriction enzyme fragments or the analysis of defined ampliconsgenerated with locus specific primer pairs.

While the present invention has been described in terms of the preferredembodiments, it is understood that variations and modifications willoccur to those skilled in the art. Therefore, it is intended that theappended claims cover all such equivalent variations which come withinthe scope of the invention as claimed. All of the references citedherein are expressly incorporated by reference.

TABLE I Detection of the twelve possible point mutations that can occurin DNA by the methods of the present invention. Each substitution isassociated with the loss (− sign) and gain (+ sign) of a cleavage site.In addition, each mutation affects the mass of two digestion products asindicated. Mass differences shown in bold face result from theincorporation of m⁵U in both transcripts (see text for details).Mutation RNase T1 RNase U2 (+) (−) (+) (−) (+) (−) strand strandtranscript transcript transcript transcript transitions A−>G T−>C +  −1Da −  −1 Da −15 Da −15 Da G−>A C−>T −  +1 Da +  +1 Da +15 Da +15 Da T−>CA−>G  −1 Da +  −1 Da − −15 Da −15 Da C−>T G−>A  +1 Da −  +1 Da + +15 Da+15 Da transversions A−>C T−>G −24 Da + − +39 Da +25 Da C−>A G−>T +24 Da− + −39 Da −25 Da T−>G A−>C + −24 Da +39 Da − +25 Da G−>T C−>A − +24 Da−39 Da + −25 Da T−>A A−>T +23 Da −23 Da + −  +9 Da  −9 Da A−>T T−>A −23Da +23 Da − +  −9 Da  +9 Da C−>G G−>C + − +40 Da −40 Da G−>C C−>G − +−40 Da +40 Da

TABLE II RNAse-A digestion products predicted for four differentpGEM3-Zf(+) derived transcripts. The ≧3-mer fragments are rankedaccording to their molecular masses. The regular transcript was preparedwith rNTP substrates. Transcripts that incorporate dTMP, dUMP, or dCMPare denoted as dT-, dU-, or dC-transcript. Fragments containing a 5′-triphosphate (5′ppp-) are indicated. regular transcript dT-transcriptdU-transcript dC-transcript expected expected expected expectedFragments mass (M⁺) fragments mass (M⁺) fragments mass (M⁺) fragmentsmass (M⁺) CAT 959.6 TGC 973.6 TGC 959.6 CCT 903.5 AAT 983.6 GAC 998.6GAC 998.6 CAT 943.6 AGC 998.6 ATGC 1302.8 ATGC 1288.8 CAT 943.6 AGC998.6 AAGC 1327.8 AAGC 1327.8 AAT 983.6 GAC 998.6 GAGC 1343.8 GAGC1343.8 AGT 999.6 AGT 999.6 AGGC 1343.8 AGGC 1343.8 GGT 1015.6 GGC 1014.65′ppp-GGGC 1599.7 TTGGC 1594.9 AGCT 1288.8 GGT 1015.6 TTGGC 1623.05′ppp-GGGC 1599.7 AGCT 1288.8 GGT 1015.6 ATAGC 1632.0 ATAGC 1618.0 CGGT1304.8 AAAT 1312.8 GGTAC 1648.0 GGTAC 1634.0 AAAT 1312.8 AAGC 1327.8TGTTTC 1886.2 TGTTTC 1830.1 GAGT 1344.8 GAAT 1328.8 GAATTC 1936.2 GAATTC1908.1 CACCT 1521.9 GAGC 1343.8 GTAATC 1936.2 GTAATC 1908.1 GGCGT 1650.0AGGC 1343.8 ATGGTC 1952.2 ATGGTC 1924.1 AGAGT 1674.0 GAGT 1344.8 TAGAGTC2281.4 TAGAGTC 2253.4 CGACCT 1867.1 5′ppp-GGGC 1599.7 GGGGATC 2338.4GGGGATC 2324.4 CGAGCT 1923.2 AGAGT 1674.0 TAAATAGC 2594.6 TAAATAGC2566.6 GCAAGCT 2252.4 GGGGAT 2035.2 TATAGTGTC 2889.8 TATAGTGTC 2833.7GCAGGCAT 2597.6 TTGAGTATTC 3194.0 TTGAGTATTC 3123.9 5′ppp-GGGCGAAT2893.5 (SEQ ID NO: 22) (SEQ ID NO: 22) ACCCGGGGAT 3232.0 (SEQ ID NO: 23)

TABLE III RNase-A digestion products predicted for the dU- anddC-transcripts of the (+) and (−) strands of the RNase-T1 coding region.Only the ≧3-mers are shown. Cleavage of the dU-transcript is C-specific. Likewise, the T-reaction is performed on the dC-transcript.Two fragments, shown in italics, assume the occurrence of 3′-extendedtranscripts (refer to Example 3). [M + H]⁺ (+) strand/C-reaction TTC904.5 TAC 943.6 TAC 943.6 TAC 943.6 AAC 982.6 AAC 982.6 GAC 998.6TATC-OH3′ 1153.7 TATC-p3′ 1233.7 TTAC 1233.7 AATTC 1562.9 5′ppp-GGGC1599.7 AAATAC 1931.2 GAAGAC 2002.2 TGTGAGC 2269.4 GAATGGC 2308.4GGTGAAAC 2637.6 TGTTGGATC 2849.7 GAAGGTTTTGATTTC (SEQ ID NO: 24) 4723.8(+) strand/T-reaction ACT 943.6 GAT 999.6 CCCT 1192.7 GGAT 1344.8 CCAAT1562.0 GGCCT 1594.0 GAGCT 1634.0 GAAACT 1947.2 ACGAAT 1947.2 ACGAAGGT2637.6 ACAACAACT 2838.8 5′ppp-GGGCGACT 2853.5 ACCCACACAAAT 3746.4 (SEQID NO: 25) CACGAAGACGGT 3890.4 (SEQ ID NO: 26) (−) strand/C-reaction TTC904.5 TTC 904.5 GTC 959.6 AAC 982.6 5′ppp-GGGC 1599.7 AAAAC 1641.0AGTTTC 1869.1 GAATTC 1908.1 AGAGAAATC 2950.8 GTGAAGTTTATATC (SEQ ID NO:27) 4417.7 GTAGTAGGGAGAGC (SEQ ID NO: 28) 4637.8GTAGTTGTTGTATTTGTGTGGGTAAGAATTGGATC 11123.7 (SEQ ID NO: 20) (−)strand/T-reaction CGT 959.6 CGT 959.6 CGT 959.6 AGT 999.6 AGT 999.6CCGG-OH3′ 1207.8 CCGGG-OH3′ 1553.0 GGAT 1344.8 GGGT 1360.8 GAAGT 1674.0CACCGT 1867.1 AAGAAT 1987.2 CAAAACCT 2509.6 CCAACAGT 2525.65′ppp-GGGCGAAT 2893.5 AGGGAGAGCT 3328.0 (SEQ ID NO: 29) CACAGAGAAAT3569.2 (SEQ ID NO: 30)

TABLE IV Spectral changes associated with single and multiple mutationsin the RNase-T1 coding region. AFFECTED REFERENCE FRAGMENTS NOVEL FRAGM.REACTION SEQUENCE [M + H]⁺ Comments [M + H]⁺ Comments mutation #1 [A −>T on (+) strand/T −> A on (−) strand] (+)/C TTAC 1233.7 not unique1194.7 (+)/T ACCCACACAAAT 3746.4 325.2 <3-mer (SEQ ID NO: 25) 3417.1(−)/C GTAGTTGTTGTATTTGTGTGGGTAAGAATTGGATC 11123.7 not observed 11162.7not observed (SEQ ID NO: 20) (−)/T GGGT 1360.8 3352.1 AAGAAT 1987.2mutation #2 [C −> G on (+) strand/G −> C on (−) strand] (+)/C AAC 982.6not unique 1,947.2 TAC 943.6 not unique (+)/T ACAACAACT 2,838.8 2,894.8(−)/C GTAGTTGTTGTATTTGTGTGGGTAAGAATTGGATC 11,123.7 not observed 1,288.8(SEQ ID NO: 20) 9,813.9 not observed (−)/T AGT 999.6 not unique 943.6mutation #3 [A −> T on (+) strand/T −> A on (−) strand] (+)/CGAAGGTTTTGATTTC 4723.8 4684.8 (SEQ ID NO: 24) (+)/T ACGAAGGT 2637.61618.0 1015.6 (−)/C TTC 904.5 not unique 943.6 not observed (−)/TCAAAACCT 2509.6 2838.8 T 325.2 <3-mer mutation #4 [A −> T on (+)strand/T −> A on (−) strand] (+)/C GAAGGTTTTGATTTC 4723.8 4684.8 (SEQ IDNO: 24) (+)/T ACGAAGGT 2637.6 1288.8 1344.8 not unique (−)/C TTC 904.5not unique 943.6 (−)/T T 325.2 <3-mer 1288.8 CGT 959.6 not uniquemutation #5 [AC −> CG on (+) strand/GT −> CG on (−) strand] (+)/C AAC982.6 not unique 653.4 <3-mer TAC 943.6 not unique 1,288.8 (+)/TACAACAACT 2,838.8 2,854.8 not resolved (−)/CGTAGTTGTTGTATTTGTGTGGGTAAGAATTGGATC 11,123.7 1,288.8 (SEQ ID NO: 20)9,868.9 (−)/T AGT 999.6 not unique 1,288.8 T 325.2 <3-mer mutation #6[AAA −> CAG on (+) strand/TTT −> CTG on (−) strand] (+)/C AAATAC 1931.2324.2 <3-mer 1618.0 (+)/T ACCCACACAAAT 3746.4 3722.3 (SEQ ID NO: 25)(−)/C GTAGTTGTTGTATTTGTGTGGGTAAGAATTGGATC 11123.7 4104.4 (SEQ ID NO: 20)7108.3 (−)/T AT 654.4 <3-mer 943.6 T 325.2 <3-mer T 325.2 <3-mer 1015.6GT 670.4 <3-mer mutation #7 [AAT −> GCG on (+) strand/ATT −> CGC on (−)strand] (+)/C AATTC 1562.9 669.4 <3-mer 959.6 (+)/T CCAAT 1562.0 1883.1T 325.2 <3-mer (−)/C GTAGTTGTTGTATTTGTGTGGGTAAGAATTGGATC 11123.7 8904.3(SEQ ID NO: 20) 669.4 <3-mer 1634.0 (−)/T AAGAAT 1987.2 3601.2 T 325.2<3-mer GGAT 1344.8 mutation #8 [AAC, AAC, TAC −> AAG, AAT, TTC on (+)strand/GTA, GTT, GTT −> GAA, ATT, CTT on (−) strand] (+)/C AAC 982.62856.7 AAC 982.6 TAC 943.6 not unique (+)/T ACAACAACT 2838.8 2605.6325.2 <3-mer ACGAAGGT 2637.6 325.2 <3-mer 2308.4 (−)/CGTAGTTGTTGTATTTGTGTGGGTAAGAATTGGATC 11123.7 2237.4 (SEQ ID NO: 20)8888.3 (−)/T CGT 959.6 not unique 1947.2 AGT 999.6 not unique GT 670.4<3-mer 614.4 <3-mer

TABLE V U-specific cleavage of a 972 nucleotides long T7 transcript. Thepredicted digestion products, 222 in total, are grouped according totheir composition. An asterisk indicates those peaks for which acompanion cyclic phosphate reaction intermediate is observed (FIG. 9).The largest fragment is absent from the obtained spectrum; a few othercleavage products appear as minor peaks and are labeled ‘weak’.Composition M + H Length Number Remarks T 325,2 1 47 CT 614,4 2 11 AT654,4 2 14 GT 670,4 2 15 C₂T 903,5 3 4 ACT 943,6 3 3 CGT 959,6 3 7 A₂T983,6 3 5 AGT 999,6 3 1 G₂T 1015,6 3 4 C₃T 1192,7 4 2 * AC₂T 1232,7 45 * C₂GT 1248,7 4 4 A₂CT 1272,8 4 3 ACGT 1288,8 4 6 CG₂T 1304,8 4 5 A₃T1312,8 4 1 weak A₂GT 1328,8 4 1 weak AG₂T 1344,8 4 5 AC₃T 1521,9 5 1C₃GT 1537,9 5 2 A₂C₂T 1562,0 5 2 AC₂GT 1578,0 5 2 * C₂G₂T 1594,0 5 7 *A₂CGT 1618,0 5 1 weak ACG₂T 1634,0 5 3 * CG₃T 1650,0 5 3 A₃GT 1658,0 5 2A₂G₂T 1674,0 5 2 G₄T 1706,0 5 1 * C₅T 1771,1 6 1 * C₄GT 1827,1 6 1 *AC₃GT 1867,1 6 2 * C₃G₂T 1883,1 6 2 A₃C₂T 1891,2 6 1 AC₂G₂T 1923,2 6 2 *C₂G₃T 1939,2 6 1 A₄GT 1987,2 6 1 AC₄GT 2156,3 7 1 C₄G₂T 2172,3 7 1A₂C₃GT 2196,3 7 1 AC₃G₂T 2212,3 7 2 A₃C₂GT 2236,4 7 1 A₂C₂G₂T 2252,4 7 2C₂G₄T 2284,4 7 1 A₃CG₂T 2292,4 7 1 A₂CG₃T 2308,4 7 2 ACG₄T 2324,4 7 1AC₅GT 2445,5 8 1 * A₂C₂G₃T 2597,6 8 1 A₄CG₂T 2621,6 8 1 A₂CG₄T 2653,6 81 A₄C₃GT 2854,8 9 1 A₆C₂T 2878,8 9 1 A₂C₃G₃T 2886,8 9 1 A₂CG₄T (5'ppp-)2893,6 8 1 A₃C₂G₃T 2926,8 9 1 C₈GT 2983,9 10 1 weak* A₂C₅G₂T 3119,9 10 1A₃C₃G₃T 3216,0 10 1 * A₂C₃G₄T 3232,0 10 1 A₃C₂G₄T 3272,0 10 1 A₂C₂G₅T3288,0 10 1 A₅C₅T 3417,1 11 1 A₅C₃G₂T 3529,2 11 1 A₆CG₃T 3625,2 11 1AC₃G₇T 3938,4 12 1 A₂C₇G₃T 4043,5 13 1 A₃C₅G₃T 4139,6 12 1 A₅C₃G₄T4219,6 13 1 A₄C₂G₆T 4291,6 13 1 A₃C₈G₃T 4661,9 15 1 A₅C₄G₅T 4854,0 15 1A₉C₃G₄T 5536,4 17 1 A₆C₆G₆T 6106,8 19 1 A₄C₇G₁₃T 8154,0 25 1 A₁₃C₈G₁₀T10370,5 32 1 not observed Σ = 222

1. A method for sequencing one or more target nucleic acids present inone or more biological samples, said method comprising the steps of: (a)deriving from one or more biological samples the one or more targetnucleic acids; (b) subjecting the one or more target nucleic acidsobtained from step (a) to two or more separate base-specific,sequence-specific or site-specific complementary cleavage reactions,wherein each cleavage reaction generates a non-ordered set of fragments;(c) analyzing the sets of non-ordered fragments obtained from step (b)by mass spectrometry; and, (d) performing a systematic computationalanalysis on the mass spectra obtained from step (c) and sequencing saidtarget nucleic acid; wherein said complementary cleavage reactions referto comprise target nucleic acid digestions characterized by varyingspecificity and/or to digestion of alternative forms of the targetsequence.
 2. The method according to claim 1 wherein the one or morebiological samples are derived from an organism selected from the groupconsisting of eukaryotes, prokaryotes, and viruses.
 3. The methodaccording to claim 1 wherein the one or more target nucleic acids areselected from the group consisting of single stranded DNA, doublestranded DNA, cDNA, single stranded RNA, double stranded RNA, DNA/RNAhybrid, and DNA/RNA mosaic nucleic acid.
 4. The method according toclaim 1 wherein one or more target nucleic acids are derived by one ormore consecutive amplification procedures selected from the groupconsisting of in vivo cloning, polymerase chain reaction (PCR), reversetranscription followed by the polymerase chain reaction (RT-PCR), stranddisplacement amplification (SDA), and transcription based processes. 5.The method according to claim 4 wherein the one or more amplified targetnucleic acids are transcripts generated from a single stranded or adouble stranded target nucleic acid by a process comprising the stepsof: (a) operatively linking a transcription control sequence to the oneor more target nucleic acids; and (b) transcribing one or both strandsof the one or more target nucleic acid of step a) using one or more RNApolymerases that recognize the transcription control sequence on the oneor more target nucleic acids.
 6. The method according to claim 5 whereinsaid transcriptional control sequences is operatively linked to one ormore target nucleic acids by PCR amplification using a primer thatincorporates the transcriptional control sequence as a 5′-extensions. 7.The method according to claim 5 wherein the transcription controlsequence is selected from the group consisting of a eukaryotictranscription control sequence, a prokaryotic transcription controlsequence, and a viral transcription control sequence.
 8. The methodaccording to claim 7 wherein the prokaryotic transcription controlsequence is selected from the group consisting of T3, T7, and SP6promoters.
 9. The method according to claim 8 wherein the RNApolymerases which utilize the T3, T7, or SP6 promoters are either wildtype or mutant RNA polymerases, the mutant polymerases being capable ofincorporating into the transcript non-canonical substrates with a2′-deoxy, 2′-O-methyl, 2′-fluoro or 2′-amino substituent.
 10. The methodaccording to claim 9 wherein the mutant RNA polymerase is either T7mutant polymerase or SP6 mutant polymerase.
 11. The method according toclaim 1 wherein the derived target nucleic acid incorporates one or morenucleosides that are modified on the base, the sugar, and/or thephosphate moiety, wherein the modifications alter the specificity ofcleavage by the one or more cleavage reagents and/or the mass and/or thelength of the cleavage products.
 12. The method according to claim 11wherein the modification is introduced through the enzymaticincorporation of a modified deoxynucleoside triphosphate, a modifiedribonucleoside triphosphate, and/or a modified dideoxynucleosidetriphosphate; or wherein the modification is introduced chemically, orwherein the modification is introduced through a combination of bothmethods.
 13. The method according to claim 11 wherein the modificationconsists of a 2′-deoxy, 2′-O-methyl, 2′-fluoro or 2′-amino substituenton the nucleotide triphosphates.
 14. The method according to claim 11wherein the modification consists of phosphorothioate internucleosidelinkages or phosphorothioate internucleoside linkages further reactedwith an alkylating reagent.
 15. The method according to claim 11 whereinthe modification consists of a methyl group on C5 of theuridine-5′-monophosphate subunits.
 16. The method according to claim 11wherein the modification consists of nucleotides that incorporatealternative isotopes.
 17. The method according to claim 1 wherein theone or more target nucleic acids of step (a) are purified prior tocleavage.
 18. The method according to claim 17 wherein said purificationis achieved through immobilization or by chromatography.
 19. The methodaccording to claim 1 wherein the complementary cleavage reactions areselected from the group consisting of enzymatic cleavage, chemicalcleavage, and physical cleavage.
 20. The method according to claim 19wherein the complementary cleavage reactions are characterized by arelaxed mono-nucleotide, mono-nucleotide, relaxed di-nucleotide, ordi-nucleotide specificity.
 21. The method according to claim 19 whereinthe one or more target nucleic acids are subjected to chemical digestionreaction consisting of treatment with alkali or with reagents used inthe Maxam & Gilbert sequencing method.
 22. The method according to claim19 wherein the one or more target nucleic acids are subjected toenzymatic cleavage reaction using one or more enzymes selected from thegroup consisting of endonucleases and exonucleases.
 23. The methodaccording to claim 22 wherein the one or more target nucleic acids aresubjected to enzymatic cleavage reaction using one or moreendonucleases, selected from the group consisting of restrictionenzymes, RNA endonucleases, DNA endonucleases and non-specificphosphodiesterases.
 24. The method according to claim 23 wherein the oneor more endonucleases are one or more selective or non-selective RNAendonucleases, selected from the group consisting of the G-specific T1ribonuclease, the A-specific U2 ribonuclease, the A/U specific phyMribonuclease, the U/C specific ribonuclease A, the C-specific chickenliver ribonuclease (RNaseCL3) and cusativin, non-specific RNase-I, andpyrimidine-adenosine preferring RNases isolated from E. coli,Enterobacter sp., or Saccharomyces cerevisiae.
 25. The method accordingto claim 1 wherein the one or more target nucleic acids arephosphorothioate-modified single stranded DNA or RNA, and wherein thecleavage reactions are performed with the nuclease P1.
 26. The methodaccording to claim 1 wherein the one or more target nucleic acids aremosaic RNA/DNA nucleic acids or modified mosaic RNA/DNA nucleic acids,prepared with mutant polymerases, and wherein the cleavage reagents areRNA endonucleases, DNA endonucleases or alkali.
 27. The method accordingto claim 1 wherein the one or more target nucleic acids are transcripts,modified transcripts, mosaic RNA/DNA transcripts or modified mosaicRNA/DNA transcripts, prepared with wild type or mutant RNA polymerases,and wherein the cleavage reagents are one or more selective ornon-selective RNA endonucleases or alkali.
 28. The method according toclaim 1 the one or more target nucleic acids are mosaic RNA/DNAtranscripts that incorporate either dCMP, dUMP or dTMP, prepared withmutant T7 or SP6 polymerase, and wherein the cleavage reagent is apyrimidine-specific RNase.
 29. The method according to claim 1 furthercomprising ion-exchange chromatographic purification of the set ofnon-ordered fragments of step (b).
 30. The method according to claim 1wherein the set of non-ordered fragments of step (b) is spotted onto asolid support.
 31. The method according to claim 30 wherein said solidsupport is chosen from a group consisting of solid surfaces, plates andchips.
 32. The method according to claim 1 wherein the massspectrometric analysis of the nucleic acid fragments is performed usinga mass spectrometric method selected from the group consisting ofMatrix-Assisted Laser Desorption/Ionization-Time-of-flight (MALDI-TOF),Electrospray-Ionization (ESI), and Fourier Transform-Ion CyclotronResonance (FT-ICR).
 33. The method according to claim 1, wherein saidmethod is used for re-sequencing of one or more target nucleic acids forwhich a reference nucleic acid sequence is known; said method comprisingan additional step wherein the one or more mass spectra of thenon-ordered fragments obtained in step c) are compared with the known orpredicted mass spectra for a reference nucleic acid sequence, anddeducing therefrom, by systematic computational analysis, all or part ofthe nucleotide sequence of the one or more target nucleic acids, andcomparing the deduced nucleic acid sequence with the reference nucleicacid to determine whether the one or more target nucleic acids have thesame sequence or a different sequence from the reference nucleic acid.34. The method according to claim 33 wherein the nucleic acid sequencedifference that is determined is a deletion, substitution, insertion orcombinations thereof.
 35. The method according to claim 34 wherein thenucleic acid sequence difference is a Single Nucleic Polymorphism (SNP).36. The method according to claim 33 wherein said method identifiesknown as well as unknown nucleotide sequence variations of said one ormore target nucleic acids present in said one or more biologicalsamples.
 37. The method according to claim 36 wherein determination ofsaid known or unknown nucleotide sequence variations allows theidentification of the various allelic sequences of a certainregion/gene, the scoring of disease-associated mutations, the detectionof somatic variations, or studies in the field of molecular evolution.38. The method according to claim 1 wherein the spectra obtained for oneor more target nucleic acids are compared with the mass spectrapredicted for a plurality of reference nucleic acids therebyidentifying/detecting one or more target nucleic acids in one or morebiological samples.
 39. The method according to claim 38 wherein saidmethod produces an expression profile of one or more biological samples.40. A method according to claim 1 for sequencing of one or more targetnucleic acids of unknown sequence present in one or more biologicalsamples, said method comprising the steps of: (a) deriving from one ormore biological samples one or more target nucleic acids in a singlestranded form; (b) subjecting the one or more target nucleic acidsobtained from step (a) to a set of four separate base-specificcomplementary cleavage reactions, wherein each cleavage reactiongenerates a non-ordered set of fragments; (c) analyzing the sets ofnon-ordered fragments obtained from step (b) by mass spectrometry; (d)performing a systematic computational analysis on the mass spectraobtained from step (c) to assemble the sequence of said target nucleicacid; and, (e) optionally, if the sequence is not uniquely defined afterstep (d), repeating steps (a) through (d), thereby generating modifiedforms of said target nucleic acid and/or different portions of saidtarget nucleic acid, and performing supplementary mono-and/ordi-nucleotide specific cleavage reactions rendering supplementary setsof non-ordered fragments until the combined data converge into a uniquesequence solution, wherein said complementary cleavage reactions referto target nucleic acid digestions characterized by varying specificityand/or to digestion of alternative forms of the target sequence.
 41. Themethod according to claim 36 wherein said method provides genome widegenotyping of one or more biological samples.
 42. The method of claim28, wherein the pyrimidine-specific RNase is RNase A.
 43. The method ofclaim 1 or 33, wherein said method comprises four RNase-specificcleavage reactions.
 44. The method of claim 43, wherein said fourRNase-specific cleavage reactions comprise RNase T1 and RNase U2cleavage of the + and − strands of said target nucleic acid.
 45. Themethod of claim 43, wherein said four RNase-specific cleavage reactionscomprise RNase A or RNase A and RNase T1 cleavage of the + and − strandsof said target nucleic acid.
 46. A kit for sequence analysis massspectrometry re-sequencing according to a method of claim 33 of one ormore target nucleic acids for which a reference nucleic acid sequence isknown in one or more biological samples using mass spectrometry, the kitcomprising: (a) one or more nucleotide triphosphates; (b) one or morepolymerases; (c) one or more nucleic acid cleaving agents; and; (d) oneor more sets of reference nucleic acids for which the nucleic acidsequence is known; (e) reagents to purify the target nucleic acid; (f)ion exchange beads in order to purify the non ordered set of fragments;(g) a solid support suitable for use in mass spectrometry analysiswhereon the non ordered set of fragments may be spotted; and, (h) acomputer software for comparing the mass spectra of the one or moretarget nucleic acid with the mass spectra of the reference nucleic acidand deducing therefrom the nucleic acid sequence of the target nucleicacid.
 47. The kit of claim 46 wherein said cleaving agent is anendonuclease selected from the group consisting of U/C specific RNase A,G-specific T1 ribonuclease, A-specific U2 ribonuclease, A/U specificphyM ribonuclease, C-specific chicken liver ribonuclease (RNaseCL3) andcusativin.
 48. The kit of claim 46, wherein said kit comprises: (a) fournucleotide triphosphates; (b) a T7 or SP6 polymerase; (c) a RNase T1 andRNase U2; (d) one or more sets of reference nucleic acids for which thenucleic acid sequence is known; (e) reagents to purify the targetnucleic acid; (f) ion exchange beads in order to purify the non-orderedset of fragments; (g) a solid support suitable for use in massspectrometry analysis whereon the non-ordered set of fragments may bespotted; and (h) a computer software for comparing the mass spectra ofthe one or more target nucleic acid with the mass spectra of thereference nucleic acid and deducing therefrom the nucleic acid sequenceof the target nucleic acid.
 49. A kit for sequence analysis massspectrometry sequencing according to a method of claim 1 of one or moreunknown target nucleic acids in one or more biological sample using massspectroscopy, the kit comprising: (a) one or more nucleotidetriphosphates; (b) one or more polymerases; and, (c) one or more nucleicacid cleaving agents; (d) optionally, reagents to purify the targetnucleic acid; (e) ion exchange beads in order to purify the non orderedset of fragments; (f) a solid support suitable for use in massspectrometry analysis whereon the non ordered set of fragments may bespotted; and, (g) computer software for analysing the mass spectra ofthe sequence of said target nucleic acid resulting in one or more uniquesequences.
 50. The kit of claim 49 wherein said cleaving agent is anendonuclease selected from the group consisting of U/C specific RNase A,G-specific T1 ribonuclease, A-specific U2 ribonuclease, A/U specificphyM ribonuclease, C-specific chicken liver ribonuclease (RNaseCL3) andcusativin.
 51. The kit of claim 46 or 49, wherein said one or morepolymerases are SP6 and T7 RNA polymerase and said cleaving agent is anendonuclease selected from the group consisting of U/C specific RNase A,G-specific T1 ribonuclease, A-specific U2 ribonuclease, A/U specificphyM ribonuclease, C-specific chicken liver ribonuclease (RNaseCL3) andcusativin.
 52. The kit of claim 49, wherein said kit comprises: (a) fournucleotide triphosphates; (b) a T7 or SP6 polymerase; (c) a RNase T1 andRNase U2; (d) reagents to purify the target nucleic acid; (e) ionexchange beads in order to purify the non-ordered set of fragments; (f)a solid support suitable for use in mass spectrometry analysis whereonthe non-ordered set of fragments may be spotted; and, (g) a computersoftware for analysing the mass spectra of the sequence of said targetnucleic acid resulting in one or more unique sequences.
 53. The kit ofclaim 48 or 52, wherein said T7 or SP6 polymerase is a mutant polymerasethat incorporates non-canonical substrates with a 2′-deoxy, 2′-O-methyl,2′-fluoro or 2′-amino substituent into the transcript.
 54. A kit formass spectrometry sequencing according to a method of claim 1 or claim33 of one or more target nucleic acids in one or more biological samplesusing mass spectrometry, the kit comprising: (a) one or moreribonucleotide triphosphates and one or more deoxyribonucleotidetriphosphates; (b) one or more polymerases; (c) one or more RNAses; (d)a solid support suitable for use in mass spectrometry analysis whereonthe non-ordered set of fragments may be spotted and (e) a computersoftware for analysing the mass spectra of the sequence of said targetnucleic acid resulting in one or more unique sequences.